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ABSTRACT 


This  project  is  a  multidisciplinary  efiort  between  3  Departments  and  Principal  Investigators.  It 
intends  to  combine:  pattern  recognition,  image  understanding  and  artificial  intelligence  techniques 
for  space-based  image  processing  as  well  as:  optical  and  digital  processing  methods.  Optical  feature 
extraction  and  sub-pixel  target  detection  and  tracking  results  are  summarzied.  Scene  representation 
and  modeling  work  using:  probabilistic  graph  matching,  multiple  resolution  rotation- invariant 
operators  and  texture  analysis  are  detailed.  Image  understanding  techniques  for  30  scene 
interpretation  discussed  include  20  image-level  methods  (using  features  such  as  edges,  lines  and 
corners)  and  30  scene-level  methods.  New  dynamic  programming,  stereo  image  and  model  building 
results  are  included. 

KBLWQRDS 

30  scene  interpretation,  artificial  intelligence,  feature  extraction,  hybrid  processors,  image 
understanding,  multiple  resolution  rotation  invariance,  optical/digital  processing,  probabalistic  graph 
matching,  space-based  imagery,  sub-pixel  targets,  texture  analysis,  time-change  imagery. 
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1.  INTRODUCTION 

1.1  OVERVIEW 

This  project  is  a  multidisciptinary  effort  intended  to  combine  methodologies  for  image  analysis  and 
interpretation,  and  evaluate  the  application  of  this  integrated  approach  to  problems  of  space-based 
imagery.  The  project  has  brought  together  research  teams  from  within  the  Departments  of  Electrical 
and  Computer  Engineering,  Computer  Science.  Robotics,  and  Biomedical  Engineering  of  CMU. 


We  have  chosen  time-varying  space-Qased  imagery  as  the  applications  domain  in  wfiich  to  evaluate 
our  integrated  approach.  The  two  aspects  of  this  domain  are  described  below: 

•  Spaca-based  imagery  involves  targe  amounts  of  information  and  incorporates  both 
structural  and  textural  properties  of  a  scene.  Efficient  detection  and  representation  of 
information  in  the  scene  are  essential  not  only  to  interpretation  but  also  to  the  storage 
and  transmiaaion  of  information.  Scenes  are  predominantly  two-dimensional  although 
light  and  shadows  affect  imaging  of  both  structures  and  texture,  and  interpretation  of 
scenes  at  increasingly  high  optical  resolution  will  require  three-dimensional  models. 

•  Interpretati6a  of  time-varying  data  is  a  primary  goal  of  space-based  image  analysis  and 
adds  an  additional  dimension  of  complexity  to  the  problem.  We  have  chosen  to  look  at 
three  time-frame  scenarios  which  require  somewhat  different  analysis  tools.  High  speed 
tracking  is  viewed  as  primarily  a  feature  extraction  proolem  and  has  been  approached 
using  optical  methods.  Medium  and  long-term  time  change  detection  must  be  based  on  a 
more  abstract  description  of  the  scene  and  methods  of  representation  and  model-based 
interpretation  must  be  brought  to  bear. 


Within  the  context  of  the  applications  domain,  we  have  addressed  the  following  methodological 
research  issues: 

•  Optical  feature  extraction  and  detection 

•  Structural  and  textural  representation  and  matching 

•  Model-based  image  interpretation 

•  Hybrid  digital/optical  computer  architectures 


These  issues  are  fundamental  to  imoiementation  and  performance  of  analysis  tools  which  could 
imbed  the  inherently  fast  and  parallel  oreprocessing  power  of  optical  techniques  into  a  system  which 
develops  artd  tests  hypotheses  about  scene  reoresentations  and  scene  models. 

In  Chapter  1  of  this  report,  we  provice  a  'nc'e  detailed  overview  of  the  conceptual  framework  of  our 
proposed  hybrid  optical/digital  syste^i  re'  ne  ;ne  space-based  image  processing  problem,  and 
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discuss  the  imp^aQce  of  this  work  to  Air  Force  technology  and  to  related  Air  Force  programs. 
Section  1 .5  provides  a  summary  of  our  research  up  to  this  year.  Section  1 .6  provides  a  summary  of 
our  current  year  of  research,  with  details  in  Chapters  2-5. 


1 .2  CONCEPTUAL  FRAMEWORK  FOR  HYBRID 
OPTICAL/DIGITAL  IMAGE  PROCESSING 

In  Rgure  M.  we  show  the  general  structure  for  our  proposed  hybrid  optical/cJiyitnl  system  using 
multiple  methodologies  for  understanding  space-based  images.  As  shown  in  Figure  i  - 1 .  input  images 
are  preprocessed  and  then  fed  to  parallel  optical  and  digital  channels  in  which  multiple  features  are 


■extracted.  A  parallel  image  modeling  system  is  also  shown  which  extracts  structural  descriptions  of 
the  image.  These  data  plus  image  registration  and  target  detection  information  obtained  from  an 
optical  correlator  channel  are  then  used  by  an  Al/IU  system  to  modify  the  parallel  input  processing 
chwnels,  to  assemble  and  interpret  a  time-history  track  file  on  objects  of  interest  in  the  image  and  to 
provide  the  necessary  textural  and  graphic  output  reports. 

1 .3  PROBLEM  DEFINITION 

Advanced  space-based  sensor  systems  will  provide  us  with  high  resolution  real-time  multisensor 
data  acquisition  in  the  near  future.  This  will  totally  pollute  present  processors  unless  we  address  how 
to  intelligently  and  timely  process  and  handle  the  projected  data  rates.  NASA  and  others  have  already 
verified  that  the  United  States  is  capable  of  collecting  more  data  than  we  can  intelligently  process 
(less  than  1%  of  all  NASA  data  has  even  been  looked  at  [Wilson  and  Silverman,  1979]). 

The  key  issue  in  Space-Based  Image  Understanding  (SBIU)  Is  not  to  transmit  every  frame  of  data 
(with  5000  X  5000  sensor  elements  in  three  bands  with  ten  bits  of  data  per  pixel,  and  a  30  frame/sec 
rate,  this  is  a  data  collection  rate  of  over  io'^  bits/sec).  No  existing  technology  can  accommodate 
such  a  high  data  collection  rate.  Therefore,  attention  should  be  given  to  the  algorithms  required  to 
achieve  this.  But  first,  here  are  several  facts  about  SBIU  problems: 

l.ln  apace-based  image  acquisition,  we  are  monitoring  certain  areas  and  regions  for 
diverse  well-defined  missions.  We  are  only  concerned  with  changes  and  do  not  need  to 
know  that  nothing  new  has  occurred  in  the  image  being  looked  at.  When  we  transmit 
only  the  associated  cnange  niormmon,  we  achieve  a  quite  significant  btndwidth 
nduction.  Thus,  we  should  process  the  data  from  space-based  sensors  on-board  the 
platforms,  determine  image  cnanges  on-line,  interpret  the  results  and  transmit  only 
textural  and  graphic  output  reports. 

2.  We  know  rather  well  where  the  sate'iite  'S  and  where  it  is  looking  and  we  know  that  the 
scene  being  imaged  correlates  .vitn  :he  prior  image  frame  or  with  our  stored  reference. 


H3XSAS  IV/nt' 
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Th«  problenys  thus  different  from  the  often  discussed  unbounded  and  unsupervised 
target  rec^nition  problem.  We  can  and  must  utilize  this  a  priori  m formation  that  the 
frame  we  ore  investigating  correlates  wuh  a  previous  one  m  our  processing  algorithms. 

3.  To  provide  better  image  registration  accuracy  and  to  facilitate  pointing  of  secondary 
sensors  at  given  areas  of  interest,  it  is  often  necessary  to  locate  key  landmarks  in  the 
image.  This  is  also  useful  in  determining  geometrical  corrections  needed. 

4.  It  is  also  useful  and  necessary  to  register  two  successive  image  frames  for  in  ter -frame 
integration  to  decrease  the  variance  of  the  noise  and  to  improve  the  image  quality.  This 
is  essential  to  accommodate  platform  variations  with  time  and  bacKground  drift.  Often. 
sub-pixel  image  registration  is  necessary. 

5.  It  is  obviously  essential  to  subtract  succsssive  frames  since  this  provides  the  necessary 
change  detection  or  time-varying  target  data. 

6.  However,  in  moat  cases,  the  image  registration  in  (4)  is  sub-pixel  and  thus  before 
performing  (S).  we  must  interpolate  the  images. 

7.  Once  ttme-hlstory  track  files  of  candidate  objects  of  interest  in  the  field-of-view  of  the 
sensors  have  been  obtained,  a  multitude  of  discrimination  analysis  techniques,  Al,  lU, 
pattern  recognition  and  human  perception  algorithms  are  necessary  to  classify, 
understand  and  interpret  the  time-change  activity  noted. 

8.  In  advanced  sensor  systems.  3-0  information  on  the  scenes  will  be  available  from  stereo 
satellites  or  other  techniques.  In  such  cases,  we  can  fully  capitalize  on  the  available 
image  information  only  by  the  use  of  advanced  3-0  scene  modeling  and  interpretation. 

The  key  point  is  the  extraction  of  scene  information  (3-0)  from  time-histories  of  2-0 
images. 

9.  To  detect  and  describe  detailed  changes  in  the  3-0  structure  of  scenes,  it  is  useful  to  first 
generate  3-0  scene  descriptions  from  the  2^0  images,  and  then  to  compare  the 
descriptions  for  changes.  Conventional  2-0  change  detection  approaches  are  not  as 
useful  for  high  resolution  images  of  complex  scenes  since  they  do  not  take  into  account 
factors  such  as  different  viewpoints  and  different  lighting  conditions  for  the  different 
images  of  the  scene.  In  order  to  detect  changes  over  successive  images  of  a  given  scene 
obtained  over  time,  it  is  useful  to  maintain  a  3-0  model  of  the  scene  and  automatically 
update  the  model  as  changes  occur.  This  requires  the  ability  to  match  the  model  with 
each  new  view  of  the  scene.  Matching  in  3-0  is  more  desirable  than  matching  in  2-0 
since  the  3-0  information  is  represented  in  a  manner  that  is  independent  of  viewpoint  and 
lighting  conditlona. 

10.  The  3-D  scene  model  is  a  useful  central  component  for  many  aspects  of  the  change 
detection  task.  Not  only  is  it  useful  for  determining  whether  changes  have  occurred,  but 
it  also  permits  model-based  interpretation  of  new  images  and  serves  as  a  central 
representation  for  accumulating  3-0  scene  information  from  various  low-level  experts. 

Our  new  research  addresses  these  aspects  of  time-history  3-0  scene  information. 

Items  1-6  address  the  high  throughput  signal  processing  aspects  of  SBIU,  whereas  items  7-10 
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address  tlw  advanqgd  image  understanding  aspects  of  this  problem.  Table  1-1  summarized 
objectives  which  must  be  attained  to  achieve  the  overall  goal  of  SBIU.  in  Table  i  2.  techniques 
required  to  attain  these  objectives  are  listed,  and  Table  i  -3  lists  the  disciplines  which  will  contribute  to 
the  achievement  of  our  goals.  As  well  as  image  processing  per  se,  we  must  study  the  importance  of 
efficient  damhese  organization  and  manipulation  since  storage  or  transmission  of  a  very  large 
database  will  be  required  for  SBIU. 

To  properly  address  understanding  of  time-varying  space-based  images,  we  feel  that  three  different 
SBIU  time-varying  image  processing  scenarios  (Table  1-4)  must  be  separately  addressed.  We 
propose  to  study  each  of  these  during  the  course  of  our  research.  Wc  distinguish  the  three  cases  by 
the  change  rate  and  the  domain  of  analysis.  In  the  first  case  (rapid  time- variations),  we  can  consider 
a  missile  launch.  In  this  application,  the  objective  is  to  track  the  time-history  of  the  missile  and  to 
transmit  the  information  that  a  missile  has  been  launched  (from  subsequent  sensors,  the  missile's 
trajectory  etc.  can  be  obtained  from  our  system  techniques  and  algorithms).  The  second  case 
(medium  time-variations)  can  concern  monitoring  of  key  sites  such  as  airports,  railroads  and  harbors 
and  know  areas  of  anticipated  concentrations  of  troops  or  armor.  In  this  case,  troop  or  armor 
movement  and  alr.iand  and  sea  activity  can  be  obtained  from  time-varying  image  data.  This  second 
scenario  is  typical  of  s  case  in  which  extensive  Al  and  lU  techniques  are  appropriate  (f.e.,  the  use  of 
information  on  the  locations  of  hangers,  runways,  r^lroad  tracks,  terminals,  switching  yards,  harbor 
channels,  docks,  piers,  etc.).  This  also  requires  the  locations  and  registration  of  these  items  in 
sequential  image  frames.  The  third  case  (slow  time-variations)  addresses  urban  development  and 
agricultural  or  land  use  activity  (as  in  Landsat  and  ERTS  case-studies). 

Table  1-1:  Objectives  of  Space-Based  Image  Processing 

•  OeHction  o(  imege  cnangM 

•  CM  of  «  ertort  knovvWdga 

•  Loeatton  of  hay  landmarks 

•  Tbna-hisiary  track  tiia  acquiaition 

•  hHsrprataiion  o(  timahiaiory  data 

•  3-0  aeana  mtarorataiion 

•  Bflelant  storad#  and  'ctnavai  of  information  from  Jawbaaa 

The  three  scenarios  noted  in  Table  i  constitute  our  definition  of  the  SBIU  problem.  All  cases 
require  the  techniques  and  disciplines  noted  m  Tables  i-2  and  i-3.  The  first  case  (rapid  time- 
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Tal^  1-2:  Image  Processing  Techniques  Required  for  SBIU 

•  Image  efiltODcemeni  and  pfetxocnsaing 

o  Image  legmtratieit  (sub-piiiei)  lor  Irame  integration 
o  Image  aubtracuon  for  timebistory  extraction 
o  Image  interpolation  lor  image  subtraction 
e  Image  segmentation 
e  Feature  extraction 
e  Image  modeling 

•  3-0  acene  modeling  and  interpretation 
e  HierarcMeal  detabeoe  deeign 


Table  1<3:  Oisciplines  Required  to  Achieve  Reai*Time  Space*Based  Image  Processing 

e  Pattern  recognition 
e  Image  underatandins 
e  Human  perception 
e  Artificiai  inteiRgence 
e  Optieai  Proceaoina 
e  Olgitei  Proceaelng 

Table  1  -4:  Time-Change  Scenarios 


TIMeCHANQI 


EXAMPUS 

MaeiieUuneD 

Reiiroed,  Airport,  Harbor,  Troope,  Armor 
Agricultural,  Land-use,  Urban  Development 


DOMAIN  OF  ANALYSIS 

Image  Pixeia 
Scene  Structure 
Statistical  Image  Modeling 


variations)  requires  primarily  sub- pixel  image  registration,  frame  integration,  frame  interpolation,  and 
image  differencing.  The  second  case  requires  techniques  involving  image  interpretation,  3-0  scene 
modeling,  3-0  matching  and  comparison,  plus  knowledge-based  geometric  reasoning.  The  third  case 


needs  more  statistical  techniques  and  stulisticai  image  models,  more  so  than  do  the  others.  All  cases 
require  object  and  scene  modeling,  image  preprocessing  and  enhancement  plus  segmentation, 
feature  extraction  and  classification.  Figure  M  depicts  these  aspects  and  the  interactive  multi¬ 
disciplinary  feedback  required  to  solve  these  S8iU  problems. 

1.4  BENEFIT  TO  AIR  FORCE  TECHNOLOGY 

With  our  three  scenario  problem  definition  (Table  1-4),  we  now  consider  tfie  myriad  of  Air  Force 
programs  and  technology  that  can  benefit  from  our  proposed  research.  First,  we  note  that  our 
research  is  directed  toward  the  development  of  new  algorithms  and  their  realization  in  a  hybrid 
optical/digital  architecture.  However,  devices  and  architectures  being  developed  in  related  Air  Force 
programs  in  VHSIC  and  VLSI,  systolic  array  processors,  Josephson  junction  devices,  etc.  can  also  be 
used  for  implementation  of  these  algorithms.  Our  work  will  thus  provide  problem  definition  and 
direction  regarding  algorithms  for  such  paralleT processor  architectures  and  technology  programs. 
Large  data  storage  requirements  and  studies  of  what  constitutes  a  valid  database  are  also  integral 
parts  of  this  program.  Similar  Air  Force  efforts  toward  data  storage  and  database  acquisition  are  thus 
of  direct  concern  to  this  program.  The  Air  Force  programs  in:  intelligent  sensors,  intelligent  task 
automation,  automated  manufacturing,  image  understanding,  human  perception  and  visual 
psychophysics  will  directly  benefit  from  the  inter-disciplinary  nature  of  our  research.  The  large  Air 
Force  effort  in  optical  data  processing  will  directly  benefit  since  real-time  spatial  light  modulators  and 
holographic  optical  elements  will  be  needed  for  implementation  of  our  algorithms  in  real-time.  The  Air 
Force  programs  in  missile  guidance  require  a  new  set  of  algorithms  and  attention  to  the  database 
requirements  and  performance  measures  used  and  thus  they  will  likewise  benefit  extensively  from  this 
program.  Oarpa/AF  programs  such  as  HALO  and  HICAMP  will  clearly  benefit  from  our  chosen 
time-varying  S8IU  tasks. 

The  monitoring  of  changes  and  developments  at  cultural  sites,  such  as  urban  areas  and  military 
bases,  is  a  very  useful  application  of  space-based  sensors.  The  techniques  we  develop  will  aid  in 
detecting  and  describing  both  large-scale  and  detailed  changes.  Furthermore,  the  techniques 
dealing  with  3-0  matching  and  comparison,  and  knowledge-based  geometric  reasoning  will  enhance 
Air  Force  programs  in  sensing  and  robotics. 


1 .5  SUMMARY  OF  RESEARCH  DONE  IN  YEAR  ONE 

In  our  first  year  of  research,  we  focused  on  the  development  and  evaluation  of  methods  which  yield 
representations  of  structural  and  textural  information  in  an  image,  and  relate  these  representations  to 
object  and  surface  contour  properties  of  the  scene.  The  techniques  studied  included  Probabilistic 
Graph  Matching,  Multiple  Resolution  Structural  Basis  Functions,  and  Textural  Sutlace  Models.  The 
structural  basis  function  and  texture  models  were  found  to  be  particularly  well  suited  to  parallel  or 
optical  processor  implementation.  Two  digital  processing  facilities  for  use  in  this  program  were  also 
assembled;  the  RAPIDbiis  architecture,  and  an  Optical  Data  Processing.  Digital  Processing  and 
Simulation  Facility. 

We  also  achieved  a  major  effort  on  the  extraction  of  time-varying  sub-pixel  targets  in  noise.  This 
time-change  scenario  concerns  applications  such  as  the  detection  of  missile  launches  or  aircraft  in 
flight.  In  the  first  year,  we  successfully  demonstrated  the  conceptual  ability  to  detect  and  track 
sub-pixel  targets. 

In  the  30  change  detection  task,  we  achieved  results  in  two  aspects:  the  low-level  problem  of 
analyzing  images  and  the  high-level  problem  of  representing,  constructing,  and  updating  the  scene 
model.  We  developed  techniques  for  extracting  building  structures  from  high  resolution  aerial 
images  of  urban  scenes,  including  lines  not  originally  found  but  predicted  by  the  model.  Image  lines 
were  classified  as  building  boundaries  or  other  lines  which  arise  from  texture  and  shadow 
boundaries.  We  also  experimented  with  efficient  methods  of  searching  a  line  image  in  order  to  form 
junctions  which  can  then  be  used  for  stereo  matching. 

At  the  higher  level  of  processing,  we  developed  techniques  for  representing,  constructing,  and 
updating  the  scene  model,  using  task-specific  knowledge. 

1 .6  RESEARCH  PROGRESS  IN  YEAR  TWO 

1.0.1  Optical  Feature  Extraction  and  Sub-pixel  Tracking 

The  optical  feature  extraction  phase  of  this  project  has  been  terminated  except  for  a  small  synthetic 
discriminant  function  (SDF)  effort  we  still  report  upon  (for  aircraft)  in  our  1985-66  report.  This  was 
necessary  because  of  the  reduction  in  ECE  funding  for  year  3  to  one-third  of  our  prior  year  2  level. 
Our  year  2  progress  and  the  proposed  tasks  for  year  3  work  included  in  the  new  task  list  are  all 
addressed  herein  and  terminated  (except  for  the  one  SDF  effort  noted  above).  Our  final  report  on 
these  is  contained  in  chapter  2.  appendix  Ai  and  the  appendices  of  our  proposal  referenced  in 
chapter  2. 


This  effort  included  attention  to  moment,  chord  and  other  optically-generated  feature  spaces. 
Architectures  for^acTT  of  these  methods  were  devised  and  initial  results  were  obtained.  These 
showed  :  the  ability  to  optically  implement  various  feature  extractors:  the  architecture  for  a  hybrid 
optical/digitat  moment  processor,  successful  initial  tests  of  this  architecture  on*a  ship  image  data 
base  and  a  robotic  pipe  part  data  base:  new  results  on  the  accuracy  of  distortion  parameter 
estimation  with  this  processor,  an  advanced  correlation  SOP  synthesis  method  and  most  successful 
initial  test  results  of  it  on  ATR  vehicles. 

Our  time  change  detection  work  has  achieved  various  significant  results  and  demonstrations  of  the 
ability  to  detect  sub-pixel  target:  rearrangement  of  our  software  to  insure  proper  statistical 
characteristics  of  the  generated  scenes;  the  development  of  new  single  differencing  methods  that 
prove  promising  for  clutter  suppression:  the  initial  formulation  of  general  space/time  filtering  for 
target  enhancement  and- background  suppression;  the  investigation  of  detector  limitation  effects. 

Our  investigations  have  revealed  a  potential  nonzero  mean  problem  in  the  correlated  noise  images 
with  high  correlation  coefficients.  This  problem  is  overcome  by  appropriate  modifications  to  our 
software.  Our  sof^tfare  is  also  rearranged  to  provide  a  more  unified  control  of  the  various  parameters 
characterizing  the  synthetic  image.  We  have  observed  that  while  the  exponential  sub>pixel  shift 
estimator  performs  better  than  the  parabolic  estimator  for  the  synthetic  data,  the  reverse  is  true  for 
LFM  signals.  This  indicates  the  need  to  consider  both  sub- pixel  estimators  in  the  future.  Our  efforts 
have  also  pointed  towards  more  sophisticated  space/time  processing  methods  for  better  clutter 
suppression. 

1 .0.2  Algorithms  for  Hybrid  Oigital/Optical  Representation  and  Matching 

This  phase  of  the  project  has  focussed  on  the  development  and  evaluation  of  methods  which  yield 
representations  of  structural  and  textural  information  in  an  image,  and  may  be  used  for  matching 
images  to  scene  models.  The  principal  results  achieved  in  this  research  include: 

•  ProbabiHstle  Graph  Matching  •  Attributed  graph  structures  are  used  as  models  of 
structural  and  statistical  information  in  the  image.  Matching  of  these  graph  structures 
using  probabilistic  similarity  methods  poses  a  number  of  interesting  problems  in  the 
mathematical  formalism,  in  the  computational  matching  algonthms,  and  in  the  application 
of  these  methods  to  real  images.  We  have  investigated  methods  of  subgraph 
decomposition  which  permit  branch-and-bound  search  of  the  matching  tree  and  provide 
efficient  pruning  of  the  possible  matches. 

•  Multiple  Resolution  Rotation -invariant  Operators  •  The  MRI  (Multiresolution  Rotation 
invariant)  operator  and  the  MRO  (Multiresolution  Difference)  transform  have  been 
introduced  to  extract  structural  and  textural  features  of  images  for  use  in  matching  and 
interpretation  phases  of  analysis.  The  MRI  is  a  complex  operator  derived  from  derivative 
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expansions  of  Gaussian  Kernels  and  will  have  magnitude  of  response  independent  of 
feature  orientation  and  phase  angle  of  response  which  provides  information  about 
orieniation.^h^spatial  and  frcnnency  domain  properties  of  these  operators  have  been 
studied  and  an  approximate  MRI  operator  which  uses  difference  of  shifted  Gaussian 
Kernels  has  been  derived  and  shown  to  be  computationally  efficient  due  to  the  scaling 
and  shift  properties  of  the  Gaussian  Kernel.  The  MRI  operators  have  been  applied  to 
aerial  images  of  objects  and  textures. 


•  Texture  Analysis  •  The  MRI  operators  described  above  have  been  used  to  characterize 
and  classify  textures  from  aerial  images.  This  set  of  multiresolution  operators  permits 
classification  of  texture  independent  of  the  size  and  orientation  of  the  texture  pattern 
Itself.  The  statistical  distribution  of  the  magnitude  responses  is  analyzed  across  the  set  of 
operators  for  regions  of  the  image.  Correlation  vnth  the  corresponding  magnitude  range 
and  the  corresponding  phase  distribution  provides  information  on  the  relative  scale  and 
the  relative  orientation.  Experiments  on  textures  from  aerial  images  and  textures  from 
simple  panems  have  been  carried  out  and  compared  to  previous  texture  energy 
operators. 

.  The  algorithms  studied  in  this  section  reflect  the  interdisciplinary  nature  of  the  project.  The  MRI 
operators  and  associated  texture  measures  are  particularly  well-suited  to  parallel  or  optical  processor 
implementation.  They  will  be  implemented  and  evaluated  on  the  array  processor  with  RAPiDbus  host 
Our  formulation  of.sthe  recursive  modei'mttching  algorithms  is  also  intended  for  implementation  on 
this  type  of  architecture  with  extensions  which  may  integrate  symbolic  and  numerical  processing. 
The  interactive  use  of  parallel  and  optical  preprocessing  with  hypothesis  formation  and  adaptive 
search  strategies  will  be  natural  continuation  of  the  work  completed. 

1.6.3  Image  Understanding  Techniques  for  30  Scene  Interpretation 


The  problem  of  detecting  three-dimensional  changes  in  a  complex  urban  scene  is  a  very  difficult 
one.  particularly  since  any  information  extracted  from  the  complex  images  is  highly  incomplete  and 
contains  many  errors.  Therefore,  we  have  thus  far  concentrated  mainly  on  the  problems  of  extracting 
information  from  such  images  and  accumulating  the  information  in  a  3D  scene  model. 


In  this  report,  we  describe  results  in  two  aspects  of  these  problems:  low  level  image  analysis  and 
high-level  model  maintenance.  The  goal  of  low-level  image  analysis  is  to  determine  a  set  of  3- 
dimensional  line  segments  in  the  scene  which  correspond  to  building  boundaries.  The  first  step  in 
such  a  process  is  to  map  the  two-dimensional  image  into  a  3-dimensional  scene.  One  method  of 
doing  this  is  to  perform  stereo  matcnmg  on  a  pair  of  images  and  use  triangulation  to  determine  the 
third  dimension. 


We  have  developed  a  stereo  aigc-thm  usmg  the  technique  of  dynamic  programming.  The  stereo 


matching  problem,  i.e.  obtaining  a  correspondence  between  right  and  left  images,  can  be  cast  as  a 
search  problem.  Wlfen^a  pair  of  stereo  images  is  rectified,  pairs  of  corresponding  points'  can  be 
searched  for  within  the  same  scanlines.  We  call  this  search  intra  scaniine  search.  This  intra-scanline 
search  can  be  treated  as  the  problem  of  finding  a  matching  path  on  a  20  search  plane  whose  axes 
are  the  right  and  left  scanlines.  Vertically  connected  edges  in  the  images  provide  consistency 
constraints  across  the  20  search  plar\es.  inter  scanline  search  in  a  30  search  space,  which  is  a  stack 
of  the  20  search  plartes.  is  needed  to  utilize  this  constraint. 

Our  stereo  matching  algorithm  uses  edge-delimited  intervals  as  elements  to  be  matched,  and 
employs  the  above-mentioned  two  searches:  inter- scanline  search  for  possible  correspondences  of 
connected  edges  in  the  right  and  left  images,  and  intra-scanline  search  for  correspondences  of 
edge-delimited  intervals  on  each  scanline  pair.  Oynamic  programming  is  used  for  both  searches 
which  proceed  simultaneously  at  two  levels:  The  former  supplies  the  consistency  constraints  to  the 
latter,  while  the  latter  supplies  the  matching  score  to  ttte  former.  An  interval-based  similarity  metric  is 
used  to  compute  the  score. 

In  order  to  pursuerthe  problem  of  high-level  model  m^ntenance  independent  of  the  current  state  of 
the  iow«level  image  analysis  research,  we  have  chosen  to  investigate  model  building  using 
rangefinder  data,  which  is  already  three  dimensiunal.  Specifically,  we  have  developed  techniques  for 
extracting  detailed,  complete  descriptiorrs  of  polyhedral  objects  from  light-stripe  rangefinder  data. 
The  descriptions  are  in  the  form  of  30  faces,  edges,  vertices,  and  their  topology  and  geometry.  A 
range  image  is  first  segmented  into  edge  pointa  A  line  drawing  is  then  obtained  by  fitting  linear 
segments  to  the  points  in  the  image,  and  refining  the  segments  to  eliminate  gaps.  Faces  are  then 
generated  from  the  line  drawing.  Interestingly,  although  the  final  description  is  in  30,  most  of  the 
processing  is  done  in  the  20  image  space.  This  work  will  be  applied  towards  the  goal  of  obtaining  a 
full  symbolic  description  of  a  scene  from  range  data  obtained  from  multiple  viewpoints  Our  30  model 
building  and  updating  results  are  detailed  in  Chapter  5. 


2.  GENERAL  3-D  FEATURE  EXTRACTORS  AND 
CORRELATORS 


This  ECE  project  phase  has  been  terminated  because  of  the  significant  reduction  in  funds  for 
1985-1986.  Because  of  the  lack  of  future  support  possibility,  contingency  funds  available  from  other 
sources  were  not  spent  to  continue  this  project. 

The  intent  of  this  task  was  to  employ  feature  extraction  and  correlation  techniques  to  locate,  track 
and  identify  large  targets  in  3-0  in  severe  clutter.  These  output  track  files  on  candidate  targets  would 
then  be  processed  by  the  iU/AI  portion  of  the  system. 

2.1  TARGET  GENERATION 

Moving  targets  and  aircraft  imagery  were  emphasized.  For  such  a  scenario,  we  proposed  a  quite 
novel  image  generation  software  package  for  aircraft  imagery.  This  routine  (Figure  2-1)  consists  of  3 
stages.  The  final  output  is  a  2-0  image  of  the  aircraft  as  seen  from  any  user  specified  orientation 
angle  9  and  for  any  object-centered  rotations  0  ,  9  and  9  and  at  any  scale  and  resolution.  The 

•  Jr  & 

aircraft  data  base  consists  of  Soviet  and  U.S.  military  aircraft  as  well  as  commercial  aircraft.  Figure 
2-2  shows  typical  images  of  several  of  these  aircraft  at  different  orientations.  A  most  attractive  aspect 
of  this  routine  is  the  efficiency  of  Step  3.  Specificaily,  our  initial  calculations  indicate  that  the  required 
matrix  transform  operations  can  be  computed  (for  ail  target  vertices,  to  determine  the  2-D  projections 
of  the  image  to  be  seen)  within  iSOfisec  using  a  quite  modest  array  processor.  This  has  significant 
importance  for  PP  since  one  can  now  realistically  assume  that  any  necessary  reference  image  (for 
correlation  or  feature  extraction  purposes)  can  be  computed  on-line.  We  also  began  initial  efforts  to 
modify  this  algorithm  to  enable  range  images  to  be  processed  (with  pixel  values  proportional  to  the 
range  of  that  portion  of  the  target).  This  satisfies  our  promised  research  on  the  proposed  Task  2  item 
for  1985-1986. 
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Figur*  2*1 :  3-0  Mod«(-Ba9«d  Object  Recognition 
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For  moving  target  acquisition  of  such  objects,  we  simply  generate  two  image  frames  with  a  standard 
background  and  with  an  aircraft  target  from  our  routine  in  Figure  2-1  inserted  in  each,  with  a 
displacement  of  the  target  between  two  successive  frames  (Figures  2- 3a  and  2- 3b).  A  simple 
differencing  of  these  two  frames  results  in  extraction  of  the  target  (Figure  2-3c).  More  advanced 
(higher  level,  etc.)  image  differencing  methods  are  required  for  other  cases.  A  time-sequence  of  such 
output  imagery  provides  information  for  a  target  track  file  and  for  input  to  an  optical  Kalman  filter  for 
state  estimation  and  trajectory  estimation.  At  this  point,  a  high -resolution  adjunct  sensor  can  be 
activated  to  track  the  object.  Alternatively,  a  laser  radar  providing  range  data  can  be  activated. 
•Figure  2-4  shows  typical  range  images  of  the  aircraft  generated  on  our  routine  in  Rgure  2-1. 


The  moments  of  an  object  can  be  optically  computed  [Casasent  at  al,  1982].  In  Appendix  A  of  our 
proposal,  we  fully  detailed  our  proposed  moment  processor  for  aircraft  classification.  This 
hierarchical  system  employs  two  first-level  estimators  (one  for  aspect  and  one  for  the  object  class) 
and  a  sacond-level  Bayesian  classifier  (requiring  a  nonlinear  iterative  technique  to  achieve  class  and 
information  parameter  estimation),  in  recent  work,  this  algorithm  has  been  fully  encoded,  but  due  to 
lack  of  furKis.  we  were  unable  to  test  it  on  our  aircraft  image  data  base,  initial  tests  showed  that  it 
provides  quite  accurate  object  orientation  estimates.  For  sufficiently  separated  ciasSSs,  it  was  found 
to  have  surprisingly  good  noise  immunity.  The  major  attraction  of  this  system  is  its  theoretical  basis. 
We  have  now  showed  that  moment  features  are  jointly  Gaussian  random  variables  for  input  plane 
translation,  scale  and  rotation  distortions.  A  Bayesivi  classifier  is  possible  snd  optimsi  (however, 
each  object  class  and  object  aspect  view  must  now  be  treated  as  a  different  image  class).  The 
first-level  estimatork  achieve  a  reduction  in  the  aspect  views  and  classes  that  the  second-level  system 
must  search.  The  class  estimator  in  this  first-level  of  the  processor  uses  unique  organized 
hierairchical  tree  structure  methods  for  synthesis  of  the  tree.  The  node  selection  and  discrimination 
function  per  node  in  the  tree  are  selected  automatically  using  a  two-level  Fisher  classifier  (follov/ing 
the  first-levef  multi-class  Fisher  classifier,  employed  to  achieve  organized  hierarchical  structure  for 
the  tree).  The  resultant  node  structure  is  thus  not  ad  hoc,  as  is  generally  done.  [Casasent  and 
Cheatham,  1985]  detail  the  most  recent  and  our  expected  performance  of  this  algorithm  (Appendix 


Figure  2-2:  TYPICAL  2D  IMAGE  OUTPUTS  FROM 
THE  SYSTEM  OF  FIGURE  2-1 


Flgur«  2*3: 

rame  1  (a)  and  Frame  2  wHh  a  target  obiect  shifted 
by  several  pixels  between  frames,  and  the  resultant 
output  difference  frame  (c) 


Figur*  2*4;  Examptes  of  synthotfcalty  gtneratod  ranga  imagaa 


Our  work  on  this  feature  extractor  was  summarized  in  Appendix  B  of  our  proposal.  This  technique 
appears  most  attractive  for  range  imagery  with  reliable  gray  scale,  and  m  th^s  case  it  appears 
preferable  to  other  realization  schemes.  No  further  work  on  this  powerful  optical  technique  to 
generate  these  distortion-invariant  features  in  parallel  has  been  performed. 


This  novel  class  of  correlator  that  promises  distortion-invariant  object  identification  was  fully 
summarized  in  Appendix  C  of  our  proposal.  It  is  thus  not  documented  further  here.  Initial  tests  of  this 
technique  on  aircraft  will  be  included  in  our  1985-1986  research  as  our  proposed  Task  7  research  for 
1S8S-1986. 


Only  initial  work  on  this  proposed  (1985-1986)  task  item  was  advanced.  We  developed  routines  to 
compute  and  display  histograms.  We  then  generated  range  imagery  of  selected  aircraft  and 
investigated  their  histograms  and  their  potential  use  in  distortion-invariant  object  classification, 
figure  2-5  shows  the  histogram  of  an  F15  with  in-plane  rotations  about  the  z  axis  by  0*.  30*,  60*  and 
90*.  As  seen,  all  histograms  are  essentially  identical.  This  verifies  that  histograms  are  invariant  to 
in-plane  rotations.  Figure  2-6  shows  the  histograms  for  the  same  object  in  Figure  2-5  scaled  in  range 
by  a  factor  a.  Comparing  Figures  2-5  and  2-6.  we  note  that  the  shapes  are  the  same  for  both  patterns, 
but  that  one  pattern  axis  is  shifted.  This  verifies  the  invariance  of  the  shape  of  the  histogram 
distribution  with  object  scale  and  the  ability  to  determine  object  scale  or  target  range  from  such  data. 
Figure  2-7  shows  histogram  plots  for  similar  object  rotations  for  an  MIG.  The  numbers  1-3  denote 
different  portions  of  the  aircraft  (wings,  fuselage,  tail  assembly).  A  comparison  of  Figure  2-7  with 
Figures  2-5  and  2-6  shows  that  the  shape  of  the  histogram  can  provide  aircraft  discrimination.  This 
concludes  our  report  for  1985-1986  on  our  proposed  Task  1  and  Task  6  research. 
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Figure  2*5:  Range  histogram  of  an  FIS  (64  x  64  pixels). 

The  reference  image  is  at  an  elevation  of  45* 
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3.  SUB-PIXEL  TARGET  DETECTION  AND 
TRACKING 


In  our  year  2  effort,  we  investigated  several  issues  concerned  with  the  detection  and  tracking  of 
sub-pixel  targets.  These  issues  include  improved  database  generation,  selection  of  optimal  sub-pixel 
location  estimators  and  interpolators  and  quantification  of  detector  limitations.  In  this  chapter,  we 
present  the  various  results  obtained  in  these  efforts. 

3.1  IMAGE  GENERATION 

Our  year  1  report  fully  outlined  the  software  needed  for  the  generation  of  synthetic  imagery  being 
used  for  simulation.  The  staring  sensor  image  l(x,y)  consists  of  three  separate  images :  a  sub-pixel  (of 
extent  less  than  i  pixel  in  the  detected  image)  target  with  a  constant  value,  Gaussian  correlated  noise 
(CN)  image  with  prescribed  mean,  variance  and  correlation  coefficients  to  simulate*  the  clutter 
background  and  uncorrelated  white  Gaussian  noise  (UCN)  image  of  zero  mean  and  prescribed 
variance  to  simuiate  the  instrumentation  noise.  These  can  be  generated  as  below. 

Let  g(x,y)  denote  a  NxN  array  of  white,  Gaussian  random  numbers  of  zero  mean  and  unit  variance. 
Such  an  array  can  be  obtained  from  the  IMSL  software  package  [IMSL1982].  Then  a  zero  mean  CN 
image  f(x,y)  with  variance  and  correlation  coefficients  p  and  p  can  be  obtained  by  the  following 
2-0  digital  infinite  impulse  Response  (HR)  filtering. 

/lx.y)  s  1)  ly)  T  p^x,y- 1)  4-  ff(l  -  p^  -  p*  -h  p^/^^''^g{x,y) 

(3.1) 

These  CN  images  are  used  to  simulate  clouds  at  various  heights.  Different  cloud  levels  are 
characterized  by  different  means,  variances  and  correlation  coefficients.  Because  of  the  small  time 
interval  between  successive  image  ‘rames.  we  assume  that  the  CN  images  are  coherent  between 
successive  frames(Rauch  1981].  Tins  picoerty  is  easily  accomplished  in  our  software  by  maintaining 
the  seed  value  for  the  random  numbe'*  gei’erator  (RNG)  to  be  the  same.  On  the  other  hand  the  UCN 
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image  is  used  to  model  instrumentation  noise  and  it  is  independent  from  frame  to  frame.  UCN  can  be 
easily  generated  by  multiplying  the  v( '  i  )  of  eq.(3.1)  (obtained  with  different  seeds)  by  proper 
constants.  The  sub  pixel  point  target  is  modeled  as  zero  outside  a  constant  square  region  (whose 
dimensions  are  less  than  l  detector  image  pixel)  in  the  high  resolution  image. 


The  CN.  UCN  and  target  images  are  combined  to  yield  a  high  resolution  (532x532  pixel)  image. 
Overlapping  blocks  of  approximate  size  8x8  are  then  combined  with  the  help  of  a  blur  function  to  yield 
the  detector  image  iii  r.y)  of  size  64x64  pixels.  The  blur  function  is  constant  in  the  interior  of  a  8x8 
region  anti  has  Gaussian  tails  at  the  hoarders.  Sub-pixel  motion  of  the  target  can  now  be  easily 
simulated  by  moving  the  taiget  by  integer  pixels  in  the  high  resolution  imagery. 


While  the  above  procedure  of  generating  a  staring  sensor  image  by  combining  CN,  UCN  and  sub¬ 
pixel  targets  at  high  resolution  and  blurring  them  seems  satisfactory,  we  observed  that  the  detected 
images  had  a  significant  non  zero  mean.  To  detect  the  source  of  this  discrepancy,  we  conducted  an 
investigation  of  the  statistical  parameters  yielded  by  the  IMSL  programs. 


The  mean  of  the  random  array  g(x,y)  of  size  NxN  is  obtained  as 


jy  jml  /ml 


It  can  be  easily  shown  [Papoulis]  that  this  mean  estimator  is  unbiased  and  has  a  standard  deviation  of 
(<r/JV)  where  <r^  is  the  variance  of  the  noise  g(x>').  For  the  images  of  interest,  a  ■  1  and  N  ^  500 
yielding  an  expected  standard  deviation  of  0.002  in  the  estimated  mean.  In  Table  3-1.  we  show  the 


estimated  means  and  variances  as  well  as  the  theoretical  standard  deviation  in  this  estimated  mean  as 


a  function  of  the  image  size  N.  It  can  be  seen  from  this  table  that  the  estimated  means  are  well  within 
(one  9)  their  expected  statistical  fluctuations.  Thus  the  PNG  being  used  seems  satisfactory. 


For  a  512x512  UCN  image  g(x>).  the  estimated  mean  is  of  the  order  0.002.  When  thi-s  UCN  image 


is  input  to  the  20  digital  MR  filter  of  eq.(3.i),  we  can  show  that  the  resulting  CN  image  f(x,y)  has 
following  estimated  mean. 


Id-Pj^l-P^l 


In  our  simulation  9  =  1  and  =  0.95.  Thus,  p^is  about  40  times  as  large  as  p^.  Thus  a 

variation  of  almost  0.08  can  be  seen  in  the  Cf-J  image  mean  as  a  result  of  vaiiation  of  0.002  in  the  UCN 


IS> 


. 


26 


SIZEN 

Mean  Estimate  p 

St.  Dev  {p} 

Variance  Estimate  a® 

10 

0.00218 

(1/10) 

0.9488 

50 

-0.01724 

(1/50) 

1.0018 

64 

•0.00181 

(1/64) 

0.9843 

128 

0.00548 

(1/128) 

0.9947 

256 

0.00252 

(1/256) 

0.9961 

512 

0.00055 

(1/512) 

1.0025 

Table  3*  1 :  Measured  estimated  statistical  parameters  for  UCN  data 

with  n  •  0 ,  a  •  1 

mean.  This  amplification  factor  increases  as  p^and  p ,  values  approaches  1 .  As  an  example,  is 
about  200  times  as  large  as  p  for  p  =  p .  =  0.99.  This  problem  is  illustrated  in  Table  3-2  where  we 
show  the  estimated  means  of  a  CN  image  for  various  P^~  p  values. 


P^P^P 

Estimated 

0.0 

•0.0004 

0.5 

•0.0014 

0.75 

•0.0034 

0.90 

•0.0099 

0.95 

•0.0217 

0.97 

•0.0388 

0.99 

•0.0683 

Table  3*2:  Measured  mean  estimates  for  a  CN  image  of  size  532x532 


To  overcome  this  problem  of  non  zero  mean  o.’npiification  due  to  digital  HR  filtering,  we  forced  the 
data  arrays  to  be  of  zero  mean  at  all  points  m  the  processing.  This  is  accomplished  by  estimating  the 
meen  vetoes  p  at  various  stages  and  then  simply  subtracting  them  from  the  data.  This  process 
rssuWed  in  a  mean  value  of  -3.3x10'^  (dose  to  computer  precision)  for  the  CN  image  whereas  it  was 
•2.18x10'^  before  this  processing.  This  'mpcriant  check  is  now  incorporated  into  our  image 
generation  software. 


Once  the  high  resolution  (532x532!  containing  CN.  UCN  and  sub-pixel  target  is  obtained,  it  is 
converted  to  a  sensor  imago  (64x€a)  using  3  ciur  function  b(x,y).  This  function  b(x.y)  has  a  constant 
value  1  in  the  center  and  decreases  rrcncic  cany  n  a  Gaussian  error  function  manner  towards  the 
edges.  Such  a  blur  function  model  acccu-'c  • ■  -  te  aoerture  effects  in  many  imaging  systemslHall). 
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For  Simplicity  of  ana|^is.  we  use  a  rectangular  blur  function  model  instead  of  the  correct  Gaussian 
function.  I  e.  we  assume  b{  v  i )  to  be  i  inside  a  souare  region  of  dimensions  PxP  and  zero  outside. 
Then  the  detector  image ;/( \  i  )  is  obtained  from  the  high  resolution  image ./( v  .1 )  as  below 


I 

/’'tTT  jT 


Since  the  operation  in  eq.{3.4)  is  linear.Jf  .r.; )  is  also  Gaussian  and  can  be  characterized  using  only 
first  and  second  order  moments.  Since  /(  a  j  )  is  of  zero  mean,  so  is  J(x.y).  it  is  instructive  to 
analytically  derive  the  second  order  statistics  of  Jix.y). 

E(.7(x.i-)<^(jf  +  Ajr.  A>')} 

.  P  P  P  f 

E{yTi+/>U-l).y+ /’ey- l))/TA:+/»(;r  + Ax- ])./+/>(>-+ A>-1))} 

f*  iml  jml  ikal  Iml 


r  tmi  jml  kml  Iml 


P  P 


P  P 


r  iml  kml  jml  /«l 


To  determine  the  variance  of  the  detector  image,  we  use  Ax  s  0  s  m  eg. (3.5)  along  with  the  fact 
the  terms  inside  the  double  sums  depend  only  on  the  difference  in  the  indices  to  obtain  the  following. 

Var{d(x.»}=i^{/».  ^ 

P  km~P  ^  km~P  ^ 

2p,(l-p')  /11-pb-2p  (l-pj 

i - -}■{ - ^ (3.6) 

P'  (I-P,)'  (1-p/ 

This  expression  is  used  in  Table  3-3  to  show  how  the  variance  of  the  detector  image  changes  as  a 
function  of  detector  size  P  and  original  CN  image  correlation  coefficients.  We  see  from  this  table  that 
the  variance  decreases  as  the  blur  function  size  increases  and  as  the  original  CN  image  becomes 
uncorrelated.  We  see  that  for  p  =  0.95  with  P-l,  the  detector  image  has  a  variance  of  0.771  instead  of 
one.  This  discrepancy  is  taken  into  account  in  evaluating  the  performance  of  our  various  algorithms 
for  sub-pixel  target  detection  and  tracking.  These  analytical  results  were  compared  with 
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catculated/measiired  estimates  of  variance  of  synthetic  images  and  very  good  agreement  (only  2%) 
error  was  observed.**'  ^ 


Correlation 

Detector  Size  P 

Coefficient  p 

2 

4 

8 

12 

0.0 

0.250 

0.063 

0.016 

0.007 

0.5 

0.563 

0.266 

0.098 

0.049 

0.75 

0.766 

0.525 

0.289 

0.178 

0.90 

0.903 

0.776 

0.596 

0.471 

0.95 

0.951 

0.882 

0.771 

0.679 

0.97 

0.970 

0.927 

0.655 

0.791 

0.99 

0.990 

0.975 

0.949 

0.924 

Table  3*3:  Theoretical  detector  image  variance  as  a  function  of  detector  size 
P  and  CN  image  correlation  coefficient  p  ■  p^  ■  p^ 


Finally,  analytical  results  are  derived  for  the  correlation  coefficients  p^'andp^'  of  the  detector 
image  d{x,y).  This  is  achieved  by  using  (Ax  s  1.  A^*  s  0)arKl(Ax  s  0  A>'  «  i)  seoarately  in  eq.(3.5). 
After  tedious,  but  straight  forward  algebra,  we  obtain 

p^i  -  // 

PJm - 

P(\  -  p*p  -  2p;i  -  pj 
and 

p/  - - ^ ^ -  (3.7) 

/»(!  -  P*)  -  2p;i  -  p') 

The  analytical  relations  in  eq.(3.7)  are  used  in  Table  3-4  to  show  how  the  correlation  coefficients  of 
the  detector  image  ^(x.;*)  are  affected  by  p  and  P.  This  clearly  shows  that  the  increasing  P  or 
decreasing  the  p  value  of  original  CN  image  leads  to  decrease  in  the  detector  image  p  values. 


While  the  above  theoretical  analysis  was  earned  out  with  the  assumption  of  rectangular  blur 
functions,  experimental  results  indicate  no  significant  differences  m  the  estimates  lor  Gaussian  blur 


-t  -■  «  t  .t 


Blur  Size  P 


2 

4 

8 

12 

0.80 

0.720 

0.563 

0.358 

0.245 

0.85 

0.786 

0.653 

0.458 

0.334 

0.90 

0.8SS 

0.755 

0.590 

0.469 

0.95 

0.926 

0.870 

0.766 

0.676 

0.97 

0.955 

0.920 

0.852 

0.789 

0.99 

0.985 

0.973 

0.948 

0.923 

Table  3*4:  Theoretical  correlation  coefficients  in  the  detector  image  as  a 
function  of  blur  size  P  and  original  CN  correlation  coefficient  p 


functions.  The  various  observations  noted  in  this  section  are  incorporated  in  our  software  to  provided 
a  unified  framework  for  image  synthesis. 


An  important  aspect  of  our  image  sequence  processing  is  the  estimation  of  sub-pixel  shift  in  the 
background  CN  images  between  successive  frames.  This  shift  is  then  used  along  with  all  interpolators 
to  produce  two  aligned  images.  These  two  properly  aligned  images  are  then  subtracted  from  each 
other  to  enhance  the  target  and  suppress  the  background.  In  our  year  1  report,  we  investigated  the 
use  of  4  sub-pixel  estimators,  namely  (i)  gradient-based  estimator,  (ii)  exponential  model  estimator, 
(iii)  parabolic  model  estimator,  (iv)  Least  Mean  Squared  (LMS)  estimator.  At  that  time,  we  showed 
through  simulation  that  the  exponential  model  based  sub-pixel  estimator  performs  best  as  this  model 
matches  precisely  with  the  correlation  function  of  the  CN  data,  in  this  section,  we  present  our  result 
on  the  use  of  the  two  non -parametric  methods  (parabolic  and  exponential)  on  a  more  general  data 
sequence. 

Because  of  the  ease  with  which  we  can  control  its  bandwidth,  duration  and  time  bandwidth  product, 
we  have  chosen  a  linear  frequency  modulation  (LFM)  signal  for  our  investigation.  The  pulse 
compression  ratio  (PCR)  of  this  LFM  sequence  is  defined  as  the  ratio  of  the  uncompressed  pulse 
width  to  the  compressed  pulse  v/idth.  or  the  product  of  the  pulse  spectral  bandwidth  B  and  the 
uncompressed  pulse  width  T.  Thus.  PCP  is  equal  to  the  time  bandwidth  product.  The  sub-pixel  delay 
estimates  obtained  for  3  different  PCP  values  and  3  different  sequence  lengths  are  shown  in  Table 
3-5, 
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Sequence 

Estimator 

Estimated  Delay 

Length 

0.1 

0.2 

0.4 

300 

Parabolic 

0.071 

0.142 

0.284 

Exponential 

0.062 

0.132 

0.306 

900 

Parabolic 

0.088 

0.175 

0.352 

Exponential 

0.068 

0.145 

0.-342 

1500 

Parabolic 

0.092 

0.184 

0.369 

Exponential 

0.069 

0.148 

0.350 

300 

Parabolic 

0.091 

0.181 

0.364 

Exponential 

0.068 

0.147 

0.349 

900 

Parabolic 

0.096 

0.192 

0.386 

Exponential 

0.069 

0.150 

0.358 

1500 

Parabolic 

0.097 

0.194 

0.381 

Exponential 

0.070 

0.151 

0.3S9 

300 

Parabolic 

0.094 

0.188 

0.381 

Exponential 

0.067 

0.146 

0.366 

900 

Parabolic 

0.096 

0.193 

0.»1 

Exponential 

0.067 

0.147 

0.^ 

1500 

Parabolic 

0.096 

0.193 

0.383 

Exportential 

0.067 

0.147 

0.368 

Tabl«  3-S:  Sub-pixel  delay  estimates  for  ttie  LFM  signal 

One  can  see  from  Table  3-5  that  increasing  sequence  length  improves  the  estimation  accuracy  in 
general  and  increasing  the  PCR  also  improves  the  estimation  accuracy.  In  general,  the  parabolic 
estimator  seems  to  outperform  the  exponential  estimator.  The  exponential  astimator  seams  to 
perform  better  for  large  sub-pixel  delays,  short  sequences  and  low  PCRs.  As  will  be  seen  in  the  next 
section,  use  of  LFM  signals  enables  us  to  observe  the  effect  of  estimator  inaccuracies  on  the  process 
performance  without  worrying  about  the  interpolators.  This  is  because,  once  the  sub-pixel  shift  is 
estimated,  it  can  be  used  in  the  analytical  expression  for  LFM  signal  to  obtain  an  ideally  interpolated 
signal.  With  this  analytically  interpolated  image,  we  observed  background  suppression  of  almost  50 
dB  (far  belter  than  observed  with  the  synthetic  images). 


The  estimated  sub-pixel  shifts  for  the  synthetic  images  aro  shov/n  in  Table  3-6.  We  see  from  this 
table  that  the  exponential  estimator  outperforms  the  parabolic  one  in  alt  cases.  This  is  because  the 
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synthetic  images  being  generated  have  exponential  correlation  functions.  Since  such  correlation 
structure  can  not  always  be  guaranteed,  it  is  decided  to  pursue  both  estimators  m  future. 


Exact  Shift 
( 0.25.0.25) 
( 0.25.  0.25) 
(•0.25.0.25) 
(-0.25.  0.25) 


Estimated  Shifts 
Parabolic 
(0.184.0.174) 

(0.186.  0.174) 

(•0.184. 0.173) 

(0  181.0.171) 


Exponential 
( 0.227, -0.237) 
( 0.232.  0.233) 
(-0.219.-0.238) 
(-0.234.0.239) 


Table  3-6:  Sub-pixel  shift  estimates  for  the  synthetic  CN  imagery 


3.3  INTERPOLATOR  SELECTION 


4 


After  the  sub-pixel  shift  between  two  successive  frames  is  estimated,  we  have  to  interpolate  one  of 
the  two  image  frames  to  align  it  with  other.  We  will  denote  the  two  detector  image  framea  by 
(x. >) and  we  denote  the  interpolated  image  i  by  Then  the  performance  of  the 

interpolation  is  estimated  by  the  following  measure  known  as  the  Background  Suppression  Ratio 
(BSR) 


10-log 


Var{d.(x.j>)} 

Var{Jj(x.>)-  (/j(x.>)} 


(3.8) 


This  BSR  measure  is  useful  in  evaluating  the  performance  of  the  estimators  and  interpolators 
separately. 


The  obiective  of  the  interpolators  is  to  produce  d^(x.y)  which  is  a  shifted  version  of  d{x.y),  namely 
d^{x.y)  =  d^{x  +  £^x.y^£^y)  (3.9) 

where  Ax  and  denote  the  shifts  in  x  and  y  directions.  We  consider  several  interpolator  schemes  to 
be  discussed  below. 

The  2-0  linear  interpolator  estimates  the  value  d.(x+ Ax.>'4- A;')  from  its  4  nearest  neighbors  as 

4 

below. 

rfi(jf .>')*(1  -  Ax)(l  -  A>)J  (x  -  A X ( 1  -  A;.)</,(x..>'+ 1) + 

A^d  -  A  ij  ■  r  «  —  AxA>d,  (x+ 1  y+  1)  (3.10) 

4 

where  Ax  i  Oand  A^^O.  The  cor'’e':r''e'-.s  eq  (3  10)  can  be  easily  seen  by  using  Ax  =  0  =  A^ 
which  yields  d.  (x.y)  =  d^{x,y). 


I 

li 

! 
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The  2-D  quadratic  interpolator  uses  a  3  x  .t  array  of  values  m  </,  (.».;•)  as  below  to  yield  the  estimate 
i7  («  r).  —  ^ 

.7  (X  .)  =  lY|(A(»..i)|(X| 


where 

lXj  =  (0.5Ajx(Ajr-I)  (I-AjT)  0.5A»(Ar+ D)'’ 
lYl  =  (0.5Aj(A.i-l)  (l-Ar)  0.5A.i<Ar+ 1)) 

and 


dAx-l.)-l) 

d^(x- l.y) 

d^ix-l.y-hl) 

lA(x>)|  = 

djx.y-  1) 

d^(x.y) 

d^(x.yhl) 

d^(x-hly-l) 

d^(x-hly) 

d^(x+l.y+l)  _ 

(3.11) 


The  cubic  spline  interpolators  are  discussed  thoroughly  elsewhere  [Hou  and  Andrew]  and  it  is 
probably  sufficient  *to  point  out  the  fact  these  are  based  on  local  piecewise  polynomial  fit  to  the 
available  data.  We  carry  out  this  cubic  spline  interpolation  by  using  IM5L  software.  One  can  easily  see 
that  the  computational  complexity  of  the  interpolators  increases  as  we  go  from  linear  interpolation  to 
quadratic  method  to  cubic  spline  based  method. 


We  can  analytically  predict  the  BSR  to  be  observed.  The  numerator  of  eq.(3.8),  namely, 
Var  {d.  (x;-)}  is  given  by  whereas  the  denominator  of  eq.(3.8)  is  as  below  We  make  the  assumption 
of  perfect  interpolation.  Then, 

Var{</j(x>')-  ^j(x>))  »  2a'-2  Cov{</j(x>).  d^(xy)} 

»2o^-2e{d(xyyd(x’^-Ax.y-hAy)}  (3.12) 

for  the  detector  image  d(x./),  the  required  covariance  can  be  shown  to  be  given  as 


Cov  { <y(  V.  1 ).  t^-v.  ;•  +  A.t )} 


/*(i-p;)-2p^(i-p^) 


(/*+Aj)(i-p;)-2p,"‘^''+ 


/*(l-p,)-2p^(l-p^) 


The  analytically  derived  cross  covariances  for  cr  =  1  (equivalent  to  detector  image  correlation 
coefficients)  are  shown  in  Table  3-7  for  /*  =  8  and  p^  =  p^  =  0.9S.  Note  front  this  table  that  the 
correlation  coefficient  of  the  covariance  value  (not  p)  changes  from  a  maximum  of  1.0  (when  the  two 
images  are  perfectly  aligned)  to  a  minimum  of  0.784  (when  one  image  is  shifted  by  0.5  pixels  in  each 
direction  with  respect  to  the  other). 


Vertical’ 


Horizontal  Sub-pixel  shift 


Table  3*7:  Correlation  coefficient  of  Cov  (not  p)  comparison  between 
two  detector  images  wHh  different  sub- pixel  shifts 

For  a  sub-pixel  shift  of  (0.25, -0.25).  Table  3-7  indicates  that  the  two  image  frames  have  a  correlation 
coefficient  of  0.953  yielding  a  variance  in  the  difference  image  of  0.069  according  to  eq.(3.l2).  The 
experimentally  observed  variance  of  the  difference  image  is  0.061  agreeing  well  with  our  theoretical 
results.  Then  the  limit  on  BSR  achievable  seems  to  be  more  fundamental  than  the  simple  interpolation 
problems.  Based  on  this,  the  simple  2  0  inear  nterpoiator  seems  to  be  our  best  choice  as  it  needs  the 
minimum  computational  complexity. 
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Our  previous  simulations  do  not  tane  into  account  the  fact  that  the  correlation  plane  detector 
needed  for  sub  pixel  shift  estimation  suffer  from  nomdealitics  such  as  limited  dynamic  range  (DR), 
detector  noise  and  detector  area.  Vijayu  Kumar  et.ai.  [Kumar]  have  previously  investigated  the  effects 
of  finite  detector  area  on  parabolic  sub  pixel  shift  estimators  and  have  shown  that  these  introduce 
small  biases  in  the  estimated  shifts.  These  sub-pixei  shift  estimators  use  the  central  5x5  region  of  the 
correlation  plane  and  the  acceptable  DR  limitations  on  the  detector  (about  30-50  dB)  seem  to  pose  no 
problems  m  accurately  detecting  these  correlation  values.  This  is  because  of  the  large  correlation 
coefficients  of  the  CN  part  of  the  image. 

To  observe  the  effect  of  detector  noise  on  sub- pixel  shift  estimators,  we  added  uniformly  distributed 
random  numbers  to  the  central  5x5  region  correlation  plane  values.  The  variance  of  this  uniformly 
distributed  numbers  is  chosen  such  that  signal  to  noise  ratios  (SNRs)  of  20,  30,  40  and  50  dB  are 
obtained  in  the  detector  plane.  The  sub-pixel  shift  estimates  for  various  SNRs  are  shown  in  Table  3-8. 


Estimated  Shifts 

Correct  Shift 

SNR(d8) 

Parabolic 

Exponential 

( 0.25.-0.2S) 

20 

(0.183.-0.173) 

(0.226,-0.235) 

30 

(0.184..0.173) 

(0.226,0236) 

40 

(0.1 84, -0.1 74) 

( 0.227,-0.237) 

50 

(0.184,-0.174) 

( 0.227,-0.237) 

( 0.25,  0.25) 

20 

(0.186.  0.172) 

(0.231,0.233) 

30 

(0.186.0.171) 

( 0.232, 0.233) 

40 

(0.186.  0.171) 

( 0.232,  0.233) 

50 

(0.186.  0.171) 

( 0.232,  0.233) 

(•0.25.-0.25) 

20 

(-0.183,-0.171) 

(-0.222,-0.236) 

30 

(-0.182,-0.173) 

(-0.220,-0.238) 

40 

(-0.182,0.173) 

(-0.21 9, -0.238) 

50 

(-0.182,-0.173) 

(-0.219,-0.238) 

Table  3-8:  Sub-pixel  shift  estimates  for  various  detector  plane  SNRs 


Comparing  Tables  3-6  and  3-8,  we  see  that  SNRs  higher  than  40  dB  have  very  little  effect  on  the 
estimated  sub-pixel  shifts.  Thus  SNRs  of  -ic  dB  are  required  in  the  correlation  plane. 
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3.5  DOUBLE  DJ^FFERENCING 

The  approach  to  background  suppression  discussed  so  far  has  been  to  estimate  the  sub-pixel  shift 
followed  by  a  first  difference  operation.  In  this  section,  we  present  some  preliminary  results  indicating 
the  role  of  double  differencing  for  target  detection  and  tracking. 

Let  </j(.r.j).  J^ix.y)  and</,(A-..i)  represent  three  successive  image  frames  in  which  the  target  and  the 
background  are  moving  at  different  velocities.  The  single  difference  imago  is  given  by  ■ 

JiX.y)  rz  )(/ (A.J')-(/j(jf.>')|  (3.14) 

whereas  the  double  differencing  yields 

.7(.v.j')  =  |0.5Jj(.v.y) +  0.5r/j(  .».>-)  I  (3.15) 

In  Fig.  3-1,  we  show  the  results  obtained  by  these  processes  on  three  images.  The  top  three  images 
in  the  figure  are  three  detector  images  d^.  J^and  d^.  The  CN  background  in  d^  is  shifted  by  (0.25,0.25) 
with  respect  to  the  background  in  d^,  and  the  CN  background  in  d^  is  shifted  by  (0.25,0.25)  with 
respect  to  the  background  in  d^.  The  target  in  all  three  detector  images  is  of  size  equal  to  one 
detector  plane  piMi  and  is  of  uniform  intensity  of  2,  while  the  variance  of  CN  background  is 
approximately  one.  The  target  is  moving  at  a  consent  velocity  with  a  horizontal  shift  of  3  detector 
pixels  and  a  vertical  shift  of  2  detector  pixels  between  each  adjacent  frames.  We  see  from  the  two 
single  difference  images  in  the  second  row  of  this  figure  that  the  background  is  not  suppressed 
completely.  On  the  other  hand,  double  differencing  result  shown  in  the  last  row  of  this  figure  displays 
a  clear  track  of  the  target  movement.  One  should  keep  in  mind  that  the  images  in  Fig.  3-1  are 
thresholded  optimally  and  thus  may  not  convey  the  complicated  nature  of  the  processing. 

The  results  presented  in  this  section  are  only  preliminary.  Some  year  3  effort  will  be  devoted  to 
analyzing  the  capabilities  of  this  double  differencing.  The  relevant  issues  include  the  BSR  achievable, 
the  Quantization  effects,  the  resuiting  frequency  responses  and  the  need  or  lack  of  need  for 
interpolation.  Fruitful  research  in  this  direction  is  anticipated  for  year  3. 

3.6  SPACE/TIME  FILTERING 

The  single  differencing  and  double  differencing  approaches  discussed  earlier  represent  two  special 
cases  of  a  more  general  philosophy  of  target  detection  and  tra-. -i'lg  known  as  "vpace/time  filtering'*. 
To  understand  this,  we  consider  the  various  image  frames  available  as  samples  of  a  3-0  function 
/Ix.y.i).  The  sampling  intervals  and  .^.i  denote  the  spatial  sampling  according  to  the  detector 
size.  This  3-0  function  can  be  modeled  as 

Ax.y,i)  =  J(z.>./)  +  CV(x>'./)  -  UCMx.y.t)  (3.16) 
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Figure  3‘1:  Single  difference  and  double  difference  images  of 

an  image  sequence 


3: 


where  .s(.\.  1. 1 )  denotes  the  sub  pixel  target, (  \  ( v.  r.  / )  denotes  the  correlated  noise  and  I  (  V  ( v,  i,  / ) 
denotes  the  uncorrelated  noise.  The  goal  of  the  space/time  filtering  is  to  process  the  sampled  3  D 
function  to  enhance  the  target  \(  while  suppressing  the  remaining  terms  m  eq.(3.l6). 

The  target  vt  v.  r.  i)  can  be  modeled  as  a  thin  straight  line  m  the  3-D  space  with  the  dimensions  of 
this  line  in  x  and  y  axis  being  sub-pixel  in  nature.  The  T  .V(.i.  i  /)  vanes  slowly  in  x  and  y  and  shows 
linear  shift  in  /.  On  the  other  hand,  the  uncorrelated  noise  U(  \tx.\.i)  is  completely  random  and  is 
characterized  by  high  frequencies.  By  observing  the  3-D  spectra  of  the  three  components  m  eq.(3.l6). 
we  plan  to  denve  an  optimal  space/time  filtering  scheme  for  the  3-D  sequence i).  issues  to  be 
resolved  in  this  connection  are  sampling  effects,  ‘optinial’  filters,  computation  complexities  and 
computationally  efficient  (sub-optimal)  filters. 

3.7  FUTURE  WORK 

Our  year  3  effort  will  focus  on  better  understanding  of  the  general  techniques  presented  here.  In 
addition  to  this,  we  will  improve  our  image  generation  software  to  incorporate  multi-region  image 
generation.  We  witl  also  explore  the  optical  interpolation  methods.  Other  advanced  sub-pixel  shift 
estimators  such  as  maximum-likelihood  and  maximum  a  posteriori  wilt  be  considered. 
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4.  MODEL-BASED  ALGORITHMS  FOR  HYBRID 
DIGITAt/OPTICAL  PROCESSING 

4.1  SUMMARY 

The  objective  of  this  research  effort  is  to  develop  algorithms  for  representation  and  interpretation  of 
space-based  images  which  are  well-suited  to  hybrid  digital/opticai  implementation. 

In  the  first  phase  of  this  program  we  have  developed  a  muitiresolution  rotation-  invariant  (MRI) 
operator  which  may  be  used  to  extract  .structural  features  as  well  as  characterize  textures  using 
statistical  measures.  Experiments  in  texture  classification  have  shovm  that  the  MRI  operator  is  a 
useful  representation  of  texture  properties  and  provides  classification  independent  of  rotation  and 
scale.  Probabilistic  graph  matching  was  used  to  demonstrate  matching  between  attributed  graph 
representation  of  structural  image  elements.  The  operators  wo  have  described  are  well-suited  for 
optical  implementation,  and  the  matching  of  representations  derived  from  these  operators  is  suited 
for  implementation  on  a  hybrid  digital/optical  system.  Evaluation  of  these  algorithms  and  their  hybrid 
system  implementation  will  be  carried  out  through  simulation  on  the  RAPIDbus  II  system.  Further 
refinements  of  the  high-speed  RAPIObus  architecture  would  support  a  hybi  id  digital/  optical  interface 
when  available. 

These  approaches  may  be  integrated  into  a  recognition  framework  based  or.  recursive  model 
matching  in  which  composite  MRI  kernels  are  generated  adaptively  based  on  hypothesis  formation  in 
a  model-based  setting.  Recursive  model  matching  is  intended  to  explore  the  capabilities  of  a  highly 
interactive  hybrid  digital/optical  system  which  utilizes  digital  hardware  to  generate  hypotheses  in  a 
knov^ledge-  based  environment  and  uses  optical  hardware  to  explore  and  validate  hypotheses  using 
convolution-based  adaptive  feature  extraction  mechanisms. 

4.2  RAPIDBUS  ARCHITECTURE 

One  important  aspect  of  the  integrated  image  analysis  system  sought  by  this  project  is  the  hardware 
and  programming  environment.  Contemporary  environments  were  not  designed  to  coupis  a  high 
bandwidth  electro-optic  processor  with  digital  processors  doing  numeric  and  symbolic  calculation. 
The  RapidBus  II  prototype,  being  developed  for  this  project,  provides  both  a  near-term  execution 
environment,  and  a  longer  term  opportunity  to  develop  new  architectural  concepts  oriented  toward 
the  needs  of  an  integrated  image  analysis  system. 

The  RapidBus  II  prototype  is  currently  in  the  assembly  and  testing  stage.  Design  documentation  is 


■UaLNVtPUIUIV 


I 

! 

i 


i 

>• 

E 


41 

being  completed  using  our  enhanced  SCALD  II  CAD  system.  Many  components  and  subassemblies 
have  arrived  and  arT" waiting  for  integration  and  test.  Over  the  coming  few  months  we  anticipate 
testing  to  progress  through  a  two.  six.  and  finally  twelve  processor  stage.  Through  the  donation  of  a 
PCB  design  system  from  IBM,  our  CAD  system  is  being  enhanced  to  carry  the  design  through  multi¬ 
layer  PCB  film. 
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Figu  re  4*  1 :  RapidBus  II  prototype  multiprocessor  configuration. 

Software  development  is  proceeding  in  parallel  with  hardware  development  using  both  a  multiple 
processor  Versabus  system,  and  a  stand-alone  RapidBus  II  node  (IBM  CS-9000).  The  host 
development  system  is  functional  for  C  and  assembly  code.  The  target  operating  system  is  operating 
on  a  single  node,  and  will  soon  be  expanded  to  a  dual-processor  Versabus  system.  An  outside  group 
is  doing  parallel  development  of  a  multiple-processor  Franz-Lisp  system  in  return  for  a  duplicate 
RapidBus  II. 

Design  concepts  for  an  advanced  machine  are  emerging  from  the  RapidBus  II  effort.  Under  the 
system  name  RapidGraph,  a  new  high  speed  interchange,  multi-programmed  processor,  and  object 
support  node  have  been  developed.  The  interchange  provides  message  passing  at  multi -gigabyte  / 
sec  rates  using  a  small  number  of  bipolar  VLSI  devices.  The  processor  is  designed  to  handle  digital 
computation  using  an  "object-flow"  model  to  integrate  both  symbolic  and  numeric  computation.  The 


object  support  n^e^implifies  the  design  of  large,  highly  parallel  software  systems  through  the 
encapsulation  of  objects  at  the  memory  rather  than  through  the  processor  instruction  stream. 

4.3  Probabilistic  Graph  Matching 

In  a  variety  of  image  processing  problems,  the  data  contains  stereotyped  subpatterns  which  are 
well -described  by  symbolic  representations.  Such  symbolic  representations  include  graph,  grammar, 
and  automata  models.  While  these  models  are  very  useful  when  subpatterns  are  highly  invariant  to 
image  variability,  symbolic  search  and  manipulation  techniques  become  very  complex  when  symbol 
correspondence  becomes  uncertain  Symbolic  representations  may  be  enhanced  in  two  respects 
which  increase  their  applicability  to  real  data.  First,  stochastic  structures  may  be  used  to  associate 
outcome  probabilities  with  structural  relations  of  the  model.  Second,  attributed  structures  offer  a  rich 
class  of  models  where  subpatterns  or  symbols  have  associated  features  or  attribute  values.  Such 
attributed  structural  models  pose  many  difficult  methodological  issues  for  implementation.  In  this 
study  we  have  addressed  problems  of  the  dichotomy  between  symbolic  and  statistical  information 
and  its  effect  on  the  choice  of  symbol  primitives,  issues  of  structural  observability,  structural 
matching,  assumptions  of  component  independence,  and  identification  of  structural  transformations. 
These  issues  will  be  discussed  in  papers  and  reports  now  in  preparation. 

An  attributed  random  graph  model  consists  of  a  4-tuple  R  ■  (V.a.E./})  where: 

1.  the  random  vertex  set  V  ■  {  V  ,  i  >  1 . n},  where  each  V  is  a  random  variable  called  the 

random  vertex. 

2.  the  random  edge  set  E  ■  {  E  ,  i «  1 . n,  j  ■  1,...,n}  in  V  X  V  where  each  E  is  a  random 

variable  called  the  random  edge. 

3.  the  random  vertex  attribute  set  a  •  {Oj,  i  •  1 . n}  where  each  a  is  a  random  variable 

with  possible  outcomes  {a}. 

4.  the  random  edge  attribute  set  fs  ■  ^ . n,  j  ■  i,...,n}  where  each  is  a  random 

variable  with  possible  outcomes  (b}. 

5.  Each  outcome  of  R  >  (V.a.E.^)  is  an  attributed  graph  H  >  (v.a.e.b)  with  probability  P(H) 

■  Prob  {V  ■  v.rt  ■  a,E  «  e,/?  ■  b}  such  that 

•  P(H)  >  OforallH.t.r, 

•  ZpPfH)  ■  1,  where  r  IS  the  "ange  of  R. 

P(H)  >9  the  probability  distr.buhcn  of  R. 

The  attributed  random  graph  moce'  -e'^i-ed  aoove  provides  a  basis  for  the  definition  of  likelihood 


functions  over  the  observed  outcomes  from  the  class  of  graphs  and  the  attribute  set  A.  The  likelihood 
of  an  observed  outcome  may  be  used  as  a  basis  for  the  matching  and  recognition  of  patterns  m  the 
image.  In  this  application  the  structural  elements  of  the  image  are  associated  with  vertex  and  edge 
symbols  of  the  model,  and  both  structural  relations  and  quantitative  properties  of  the  elements  are 
retained  in  the  model  image  components  such  os  vertices,  edges,  regions,  or  intensity  peaks  may  be 
used  as  structural  elements.  In  the  resulting  probabilistic  model,  each  element  has  some  outcome 
probability,  some  observation  probability,  and  some  probability  density  of  attribute  values. 

A  simple  example  of  a  graph  representation  derived  from  a  gray  level  image  of  a  polyhedron  is 
shown  in  figure  4-2.  A  line  drawing  of  the  original  image  is  shown  in  figure  4-2a.  The  graph  structure 
extracted  frcm  a  single  observation  is  shown  in  figure  4- 2b  where  graph  vertices  have  been  attached 
to  structural  corner  elements  of  the  original  image  and  graph  edges  have  been  attached  to  edge 
elements  of  the  original  image.  An  ensemble  of  observations  such  as  that  in  figure  4-2b  is  used  to 
derive  a  probabilistic  graph  model  such  as  that  shown  in  figure  4-2c.  In  figure  4-2c,  the  probability 
distribution  of  positions  of  the  vertices  vare  indicated  by  circles.  The  probability  distribution  of  vertex 
angle  attributes  is  indicated  by  p(^). 

Probabilistic  graph  matching  may  be'  used  for  matching  of  images  and  recognition  of  objects  in 
images  using  likelihood  criteria  as  a  basis  For  search  correspondence  trees.  The  likeiihood  of  any 
observed  graph,  subgraph,,  or  structural  element  may  be  computed  and  used  for  estimation  or 
decision  making.  Such  problems  incorporate  three  phases:  (t)  correspondence  matching  of  graph 
elements.  (2)  rigid  graph  pose  estimation,  and  (3)  likelihood  calculation.  The  probabilistic  graph 
model  uses  pose  independent  likelihoods  to  hypothesize  correspondence,  then  estimate  pose.  The 
use  of  attributes  to  guide  correspondence  matching,  and  the  use  of  observation  probabilities  to 
structure  the  search  results  in  simplified  and  reliable  algorithms. 

We  have  applied  the  probabilistic  graph  matching  approach  to  two  typos  of  im.-jge  representation. 
Graphs  which  are  derived  from  edge,  corner,  and  junction  components  of  gray  level  images  are 
useful  for  description  and  matching  of  objects.  In  this  case,  attributes  include  lengths,  angles,  and 
positions  of  elements.  The  resulting  graph  models  have  been  used  to  classify  objects,  inspect 
objects,  and  determine  orientation  of  objects  in  scenes  where  edge  information  is  a  reliable  clue.  An 
example  of  matching  likelihoods  between  a  model  graph  and  various  distorted  observation  graphs 
including  partial  views  is  shown  in  figure  4.3,  The  likelihood  is  a  measure  of  the  correspondence 
between  the  two  structures  in  each  case  a  similar  approach  may  be  used  to  track  movement  of  the 
model  object  by  matching  successive  viev.o  .■md  computing  the  pose  changes  between  views.  Such 
an  example  is  shown  in  figure  4-4  for  the  same  object  used  m  the  previous  examples. 


Figure  4-2:  Probabilistic  graph  derived  from  an  ensemble  of  gray -level  images. 
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Figure  4-4:  Tracking  movement  of  an  obiect  using  successive  likelihood  matches 

with  a  probabilistic  graph  model. 


As  a  second  example,  we  have  used  probabilistic  graph  models  for  matching  of  muitiresoiuiion  tree 
structures  derived  from  gray-level  images.  The  derivation  of  such  a  muitiresoiuiion  tree  using  the 
difference  of  low-pass  transform  is  described  in  the  next  section,  in  this  case,  the  matching  algorithm 
IS  formulated  as  a  hierarchical  tree  search  using  likelihoods  to  guide  correspondence  matching  at 
each  level  of  the  tree.  Results  of  this  approach  are  described  m  [Crowley  and  Sanderson  84], 

Probabilistic  graph  models  provide  a  general  approach  to  many  practical  matching  and  recognition 
problems  v/here  a  training  set  of  images  of  a  prion  knowledge  of  the  probability  structure  is  assumed. 
In  this  way.  obiect  model-based  information  may  be  incorporated  using  a  priori  probability  structures. 
Many  previous  approaches  to  matching  and  pose  estimation  may  be  considered  subsets  of  the 
probabilistic  matching  approach  m  which  edge  or  region  attributes  of  images  are  related  by  heuristic 
similarity  measures  rather  than  iikelihoods. 

4.4  MRI  Operators  for  Shape  Representation 

Description  of  grsy-scale  shape  in  images  is  complex  because  shapes  are  often  defined  by  some 
combination  of  ragioo  information  and  sdge  information.  From  tha  point  of  view  of  image  processing, 
region  information  is  often  contained  in  the  lower  spatial  frsquency  components  of  the  image,  while 
edge  information  is  contained  in  the  higher  spatial  frequency  components.  A  complete  description  is 
difficult  to  achieve  therefore  from  extraction  of  structural  elements  at  one  resolution  level.  A  number 
of  techniques  have  been  proposed  which  transform  the  two-dimensional  gray-level  image  to  a  three- 
space  representation  in  (x.y.K)  space,  where  k  is  the  parameter  of  the  resolution  space.  Such  s 
representation  has  the  advantage  that  peak  structures  in  the  (x.y.k)  space  shift  uniformly  along  the  k 
axis  under  scale  transformatior^.  and  therefore  obiects  of  different  size  may  be  recognized  in  a 
representation  with  the  same  structural  relationships,  in  this  section  we  describe  an  extension  of  the 
multiple  resolution  tree  which  incorporates  structural  basis  functions  at  each  level  in  order  to  provide 
a  more  complete  description  at  each  level  and  include  orientation  specific  structural  components  at 
each  resolution  level. 

Our  preliminafY  work  on  representation  and  probabilistic  matching  of  multiple  resolution  structures 
has  been  carried  out  using  a  reversible  transform  called  the  "difference  of  low-pass"  or  DOLP 
transform  developed  by  Crowley  [Crowley  St  Crowley  and  Stern  84,  Crowley  and  Parker  84].  The 
DOLP  transform  expands  a  single  gray  eve*  mage  f(x  y)  into  a  set  of  bandpass  images  b(x.y.k),  where 
k  is  the  index  of  the  multiple  resolution  tree  E::cri  bandpass  image  is  obtained  by  convolution  of  the 
original  image  with  an  appropriate  banbcass  mpulse  response  function  h(x,y,k).  In  the 
implementation  of  this  transform  by  C  '-o-viey  81).  the  bandpass  impulse  responses  are 
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constructed  out  of  difference  of  Gaussian  kernels  The  most  efficient  implementation  utilizes 
properties  of  the  Gaussian  kernel  function  which  permits  resampling  and  cascaded  convolution  with 
expansion  and  reduces  the  sequential  computation  of  this  transform  from  0(N  )  multiplies  and 
additions  to  0(N). 

The  DOLP  transform  set  itself  is  not  a  very  efficient  representation  of  the  image  since  it  requires 
expanded  storage  space  m  a  normal  digital  representation.  However,  it  is  possible  to  extract  a 
symbolic  representation  of  important  information  from  the  DOLP  transform  using  peaks  of  the 
transform  arrays  as  key  structural  elements  of  the  image,  in  [Crowley  and  Sanderson  84),  we 
introduced  two  levels  of  symbolic  representation.  The  first  level  is  composed  of  symbols  derived 
directly  from  the  DOLP  images  based  on  local  positive  maxima  or  negative  minima  in  one,  two,  or 
three  dimensions  of  the  DOLP  space  representation.  The  second  level  of  symbols  utilize  the 
connectiv.'ity  among  peaks  and  ridges  to  form  peak  paths  and  ridge  paths.  These  symbol  structures 
are  defined  in  detail  in  [Crowley  and  Sanderson  84]. 

The  advantage  of  the  multiresolution  representation  techniques  is  the  ability  to  describe  both  high 
resolution  and  low  resolution  structural  image  features  in  the  same  representation.  The 
disadvantages  of  the  bandpass  filter  approach  are  the  difficulty  in  describing  complex  shapes, 
particularly  those  involving  oriented  components  and  the  current  demands  of  the  computation  to 
compute  such  extensive  filtering  operations.  In  order  to  enrich  the  capabilitiel  of  the  multiple 
resolution  transform,  we  have  introduced  a  set  of  basis  functions  at  each  resolution  level  which 
includes  oriented  two-dimensional  basis  functions.  This  set  of  structural  basis  functions  provides  a 
much  more  complete  description  of  the  image  at  each  level,  at  the  expense  of  redundant  information 
and  increased  computational  load.  In  the  context  of  this  project  we  propose  to  explore  the 
implementation  of  such  techniques  using  parallel  and  optical  processors.  In  this  context  the  basis 
function  tree  provides  a  richer  source  of  information  for  matching  and  interpretation  which  may  be 
searched  interactively  rather  than  exfiaustiveiy  computed. 

Multiresolution  representations  provide  a  basis  for  searching  an  image  database  with  respect  to  size 
independent  features.  Cur  experience  with  the  DOLP  transform  described  in  section  4.2.4  has 
suggested  that  configurations  of  DOLP  peaks  are  in  fact  good  descriptors  of  image  structures,  but 
that  as  the  resolution  is  increased  such  configurations  become  exceedingly  complex.  In  addition, 
lighting  conditions,  shadows,  and  background  variations  may  cause  significant  distortions  of  the 
DOLP  representation  which  are  difficult  to  interpret  due  to  the  lack  of  more  specific  orientation  and 
structure  information  in  the  DOLP  transform  itself.  The  transform  itself  accentuates  symmetrical 
contrasting  regions  and  is  useful  for  locating  regions  of  interest,  but  may  not  be  very  efficient  in 
describing  complex  structures. 
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We  have  introduce^  the  MRI  (multiresolution  rotation  invariant)  operator  and  the  MRD 
(multiresolution  difference)  transform  in  order  to  derive  a  more  efficient  representation'  of  complex 
structure  os  well  as  texture.  The  MRI  operator  of  order  n  and  resolution  k  is  defined  by; 
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where  (r,f>)  are  polar  coordinates  of  the  operator  space.  The  MRI  operator  is  rotation  invariant  in 
the  sense  that  the  magnitude  of  the  response  is  independent  of  the  orientation  of  a  directional 
component  of  the  input  image. 

The  significance  of  the  MRI  operators  may  be  seen  by  examining  their  projection  p(x)  along  any 
single  radial  axis.  These  operators  of  order  n  have  the  following  interpretation; 

•  n  a  0;  Point  Detector 

•  n  a  1 ;  Edge  Detector 

•  n  a  2:  Line  Detector 

•  n>2:  Higher  order  ripple  detectors 

Each  of  the  complex  operators  defined  in  this  manner  will  have  magnitude  of  response  related  to 
the  magnitude  of  that  feature,  and  angle  of  the  response  related  to  the  orientation  of  that  feature.  In 
addition,  the  detector  masks  for  different  orders  are  orthogonal,  and  therefore  energy  is  distributed 
independently  among  the  features.  The  Fourier  spectra  of  these  MRI  operators  show  that  n  a  o 
corresponds  to  a  low-pass  filter,  while  operators  of  increasingly  higher  order  n  correspond  to  band¬ 
pass  filters  of  decreasing  bandwidth. 

The  MRD  transform  is  defined  by; 


(7) 


! 


1  -r^/2ko^ 


1  -r2/2(k+l)o^]  jn# 

k+l  • 


Based  on  this  definition  the  MRO  of  order  0  is  lust  OOLP  transform.  The  MRO  terms  of  higher  order 
iilso  provide  a  reversible  decomposition  of  the  gradient  image  of  order  n.  That  is.  the  MRO  transform 
of  order  n  of  an  image  is  sufficient  to  reconstruct  the  nth  order  gradient  of  the  same  image,  but  not 
necessarily  the  original  image  itself. 

The  MRI  operators  provide  a  basis  for  multiresolution  decomposition  of  an  image.  The  resulting 
multiresolution  representation  may  then  be  interpreted  as  the  response  of  a  set  of  orthogonal  feature 
detectors  and  searched  for  significant  response  regions  which  will  characterize  the  structure  in  the 
image.  Unlike  peaks  in  the  OOLP  transform  space  which  do  not  carry  orientation  information, 
magnitude  peaks  in  the  MRI  space  may  be  associated  with  the  angle  response  to  provide  important 
structural  clues.  In  the  proposed  research  program  we  will  implement  and  evaluate  the  MRI  operators 
as  tools  for  the  representation  and  detection  of  structural  features  in  aerial  images  of  airports  and 
harbors. 

The  interpretation  of  the  MRI  operators  in  the  Fourier  spectrum  may  be  related  to  the  performance 
of  the  texture  energy  measures  reviewed  above.  The  texture  energy  measures  provide  statistical 
information  about  the  sampled  local  two-dimensional  spectrum  of  the  image  at  some  resolution  level. 
Statistical  summary  information  from  the  MRI  operator  space  includes  distribution  estimates  of 
orientation  from  multiple  operators  as  welt  as  magnitude  information,  in  the  proposed  research  we 
will  implement  and  evaluate  the  MRI  operators  as  toots  for  the  description  and  segmentation  of 
textured  regions  in  aerial  images  of  airports  and  harbors. 

A  number  of  extensions  to  the  MRI  operators  and  MRO  transform  have  been  investigated.  The 
shifted-  Gaussian  MPI  operator  seems  to  provide  significant  computational  advantages  for  digital 
implementation  although  it  is  an  approximation  to  the  rotational  invariance  property  of  the  MRI  itself. 

The  aerial  imagery  being  examined  is  often  obtained  in  a  multispoctral  format,  and  spectral  contrast 
is  often  a  useful  clue  to  structural  and  textural  features.  Extensions  of  the  MRI  operators  in  which  the 
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phase  space  is  mapped  into  a  spectral  domain  seems  to  be  a  feasible  extension  of  the  concepts  to 
provide  representation  of  color  features.  Initially,  application  of  the  SG-MRI  operator  with  contrasting 


color  coordinates  for  the  component  Gaussians  will  be  examined. 


The  SG-MRI  operator  also  has  a  direct  extension  to  the  detection  of  temporal  shift  of  structural 
features.  By  associating  Gaussian  components  with  different  time  frames,  magnitude  and  orientation 


of  time  shifts  behveen  frames  can  be  obtained,  initially,  we  propose  to  develop  these  concepts  in  a 


space  time  frequency  domain  framework  and  study  the  tuning  of  operators  to  various  types  of  feature 


shifts. 


The  operators  and  extensions  described  above  are  all  well-suited  to  optical  implementation,  but  for 
our  studies  this  functionality  will  be  simulated  using  the  RAPiDbus  II  architecture.  This 
implementation  provides  the  basis  to  examine  the  use  of  these  operators  in  a  system  where 
interactive  search  and  adaptive  tuning  of  the  masks  is  a  significant  property  of  the  algorithms. 


The  recursive  model  matching  strategy  described  in  the  section  3.6  requires  interactive  tuning  of 
correlation  masks.  .The  masks  used  in  those  studies  will  be  derived  from  the  set  described  here  and 
their  extensions.  A  great  many  possible  composite  MRI  operators  could  be  defined  based  on  the 
primary  set,  and  these  could  play  a  useful  role  in  the  recursive  strategy. 


Figures  4-S-4-8  show  the  effects  of  changing  one  parameter  at  a  time  on  the  mask  shapes  of  the 
MRI  operators  of  the  real  and  imaginary  planes  and  their  corresponding  magnitude  and  phase  planes. 


Fig.  4-5  demonstrates  the  effect  of  changing  the  size  of  the  mask.  It  is  important  that  the  mask  size 
chosen  be  large  enough  so  that  the  values  at  the  edges  of  the  mask  are  near  zero.  Failure  to  do  this 
will  mean  that  the  mask  is  not  symmetric  and  the  property  of  rotational  invariance  will  no  longer  be 
valid.  As  can  be  seen,  changing  the  size  of  the  mask  does  nothing  to  the  generated  kernel  itself;  it 
only  affects  the  extent  of  the  kernel  that  will  be  included  in  the  mask. 


Fig.  4-6  shows  the  effect  of  varying  the  order  of  the  masks  created  while  keeping  the  remaining 
parameters  the  same.  The  value  of  n  is  equal  to  half  the  number  of  zero-crossings  encountered  in 
either  the  real  or  the  imaginary  planes  as  a  contour  at  a  fixed  radius  R  >  0  is  followed  for  2v  radians,  n 
«  0  is  a  low-pass  filter,  n  >  1  is  an  edge  detector,  n  *  2  is  a  line  detector,  n  >  3,4,5,...  are  higher 
order  ripple  detectors.  Note  that  the  radius  of  the  maximum  magnitude  uf  the  mask  pair  increases 
linearly  as  n  increases.  This  holds  true  only  when  (j'k  is  a  constant. 


Figs.  4-7  i  4-6  show  the  effects  of  varying  a  and  k  respectively.  These  two  parameters  always 
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appear  tegelher  irrthe^uatipn  above.  Together,  they  form  the  term  a  k  which  can  be  considered  to 
be  the  "variance"  of  the  gaussian  filter.  Increasing  either  will  make  the  spread  of  the  gaussian  larger. 
The  resolution  parameter,  k  can  be  regarded  as  "fine  resolution"  since  increasing  by  a  small  amount 
will  change  the  spread  of  the  gaussian  far  less  than  will  the  same  change  in  9,  the  "coarse 
resolution".  Therefore,  in  the  two  figures  we  see  that  there  is  very  little  difference  in  the  plots  in  which 
k  was  increased  but  we  see  a  much  larger  difference  in  the  plots  in  which  9  was  increased. 

The  following  is  an  example  of  the  results  obtained  by  applying  different  masks  to  a  single  image. 

4.5  Texture  Classification  Using  MRI  Operators 

Texture  occurs  in  images  due  to  either  irregular  surface  topography  or  to  nonuniform  surface 
reflectance.  There  have  been  a  number  of  approaches  to  the  modeling  of  texture  in  images  [Haralick 
70,  Laws  79.  Harwood  et  al  83],  Most  of  these  rely  on  the  modeling  of  local  correlation  properties  of 
the  gray-level  image  using  either  direct  statistical  measures  or  using  the  response  to  specific  masks. 
In  particular,  [Laws  79]  described  a  set  of  texture  energy  measures  in  terms  of  the  response  to  linear 
3  X  3  or  5  X  S  mastts.  These  masks  are  chosen  to  reflect  combinations  of  center-weighting,  edge 
detection,  and  spot  detection  templates.  The  distribution  of  the  outputs  of  these  masks  averaged 
across  a  textured  region  was  shown  to  be  useful  for  the  disc.nmination  of  texture  types.  Harwood 
[Harwood  et  al  83]  extended  this  idea  and  studied  the  use  of  rank  correlation  statistigs  as  a  basis  for 
discrimination. 

Texture  models  such  as  those  described  above  summarize  descriptions  of  the  variations  in  image 
intensity,  but  do  not  relate  image  properties  to  either  surface  topography  or  surface  reflectance.  An 
alternative  model  of  image  texture  has  been  proposed  using  fractal  geometries  to  model  image 
texture  and  relate  image  texture  to  surface  topography.  Fractal  geometry  was  introduced  by 
[Mandelbrot  77.  Mandelbrot  82]  to  describe  certain  classes  of  irregular  edges  or  surfaces  including 
coastlines  and  mountain  profiles.  More  recently.  [Pentland  83a,  Pentland  83b]  proposed  the  use  of 
the  fractal  dimension  to  characterize  images  of  natural  scenes  and  perform  texture  segmentation. 

The  fractal  dimension  0  is  the  dimension  of  a  measurement  spaco  expressed  relative  to  the 
topological  dimension  E.  If  the  parameter  H  >  0  •  E  is  used  to  characterize  the  roughness  of  the 
observed  texture,  then  H  «  0  corresponds  to  a  flat  plane,  while  H  *  1  corresponds  to  an  array  of  spikes 
covering  the  plane.  In  terms  of  H,  the  cumulative  distribution  function  of  the  fractal  Brownian  function 
B(t)  is: 

/^«)= /*'<(//(/+ A/)- /f(/)l/|A/|</i). 
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where  if  Hat/2  and  if  F(n)  is  a  zero-mean,  unit  variance  Gaussian,  the  B(t)  defines  a  classical 
Brownian  motion  process.  The  fractal  dimension  in  this  sense  is  a  compact  parametric  description  of 
a  homogeneous,  isotropic  random  process  which  has  advantages  since  it  is  invariant  to  scale. 
Pentland  [Pentiand  83a.  Pentiand  83b]  related  the  image  texture  model  to  a  natbral  surface  model 
and  showed  that  if  the  surface  topology  is  fractal  then  the  image  intensity  is  also  fractal  if  the  surface 
obeys  Lambertian  assumptions.  This  result  for  fractal  models  is  suggestive  of  more  powerful  results 
which  might  be  achieved  by  relating  image  texture  to  more  general  random  models  of  surface 
topography. 

Studies  of  random  surface  topography  suggest  four  p.'incipal  contributions  to.the  resulting  textured 
image: 

•  Local  edges  of  the  surface  elements 

•  Shading  due  to  surface  gradient  and  reflectance 

•  Shadows  due  to  disparities  between  light  incidence  and  viewing  angles 

•  Local  edges  of, one  surface  element  occluding  another. 

These  mechanisms  are  associated  with  surface  topography  and  not  with  reflectance  changes  due,  for 
example,  to  surface  markings.  The  observed  image  texture  varies  in  predictable  ways  with  angle  of 
view  and  lighting  directions,  and  we  would  like  to  identify  image  texture  measures  which  provide 
consistent  measures  of  such  changes. 

Work  in  this  area  has  centered  around  the  use  of  texture  energy  measures  as  described  in  [Laws 
79],  Both  real  and  simulated  images  have  been  used  to  demonstrate  the  efficacy  of  this  technique  in 
distinguishing  different  2-dimensional  textures.  Extension  of  this  method  to  3-dimensional  texture 
analysis  is  currently  being  studied.  Such  analysis  will  aid  in  understanding  the  relationship  between 
surface  contours  and  image  texture. 

Simple  one-  and  two-dimensional  masks  form  the  basis  for  texture  energy  measures.  The 
distribution  of  the  outputs  of  the  different  masks  averaged  over  a  textured  region  is  useful  in  texture 
discrimination.  These  masks  are  chosen  to  reflect  combinations  of  level,  edge,  and  spot  templates. 
The  one-dimensional  vector  masks  are  we-ghted  towards  the  center  and  all  are  either  symmetric  or 
antisymmetric  and  all  but  one  are  zero-sum  Five  length  vectors  are  generated  by  convolving  two 
three-length  vectors.  One-dimensicnai  .-cctor  masks  can  be  run  both  horizontally  and  vertically 
across  an  image.  The  two-dimensionai  musns  are  formed  by  convolving  a  horizontal  vector  with  a 
vertical  vector  of  the  same  length  Figure  -i  3  inows  a  number  of  the  masks  that  are  used  in  texture 
energy  measurements. 


Eight  of  the  3x3  masks  and  all  one-dimensional  vector  masks  except  for  the  13  and  15  level  vectors 
are  zero-sum.  Convolv^  over  a  region  with  uniform  pixel  intensity  (i.e. -no  texture),  a  zero-sum  mask 
will  produce  an  output  that  is  identically  zero.  "Texture  Energy"  refers  to  non  zero  values  resulting 
from  convolutions  with  zero-sum  masks.  A  Textured  region  is  first  histogram  equalized  to  ensure  that 
every  region  starts  with  the  same  average  intensity.  The  various  masks  are  convolved  over  the  region 
separately,  creating  a  number  of  "texture  planes",  one  for  each  mask.  The  average  pixel  intensity  for 
each  resulting  plane  is  then  taken  as  a  texture  energy  measure  and  collectively  they  form  a  feature 
vector  that  can  be  used  in  texture  classification. 

Fourier  analysis  of  the  various  kernels  reveal  that  these  texture  energy  masks  serve  as  bandpass 
filters  in  the  frequency  domain.  Alone,  the  3- length  vector  masks  13.  s3  and  e3  correspond  to  low- 
pass,  high-pass  and  band-pass  filters  respectively.  The  5- length  vector  masks  also  operate  as  filters, 
though  each  mask  peaks  in  a  narrower  frequency  range.  The  two-dimensional  masks  work  as 
bandpass  filters  in  the  2-d  frequency  plane.  Each  of  the  nine  3x3  masks  is  found  to  peak  in  a 
predictable  manner  in  each  quadrant  of  this  plane  as  shown  in  figure  4-10.  Each  5xS  and  7x7  mask 
aiso  has  a  unique  peak  in  the  2-d  frequency  plane.  The  set  of  all  masks  of  size  NxN  covers  the  entire 
frequency  plane.  * 
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Figure  4>9:  Texture  Energy  Masks 
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An  image  with  a  number  of  simulated  textures  (4-1 1)  was  generated  and  texture  energy  measures 
were  determined  for  each  of  those  textures.  The  first  row  of  textures  is  column  oriented.  The  first 
block  alternates  dark  and  light  columns:  the  second  block  alternates  two  dark  and  two  light  columns; 
the  third  block  alternates  four  dark  and  four  light  columns:  and  the  fourth  blo'ck  of  the  first  row 
repeats  the  sequence  of  (... light. medium.dark.  medium..)  columns.  The  second  row  of  textures  is  row 
oriented.  It  is  identical  to  the  first  row  except  that  the  textures  have  been  rotated  90  degrees.  The 
first  block  of  the  third  row  has  a  checkerboard  pattern.  Each  square  of  the  checkerboard  is  one  pixel 
in  size.  The  second  block  of  third  row  has  alternating  light  and  dark  diagonal  lines  that  are  two  pixels 
v/ide.  initially,  alt  texture  patterns  have  the  same  average  intensity  Random  noise  was  introduced 
into  the  image  before  processing.  The  texture  measures  for  this  simulated  image  are  tabulated  below 
in  Table  4-t.  Because  the  row-  and  column-oriented  textures  are  obviously  linear,  3-  and  5-length 
linear  vector  masks  were  used  in  addition  to  the  3x3  masks.  The  values  obtained  indicate  that  the 
different  textures  do  indeed  result  in  unique  texture  energy  measures  and  that  these  measures  could 
be  used  to  distinguish  between  textures. 

The  aerial  image  shown  in  figure  4-12  was  used  in  applying  texture  energy  masks  to  a  real  image. 
Four  textured  sections  containing  dirt,  grass,  and  two  different  sections  of  water  were  taken  from  the 
original  image  and  each  was  histogram  equalized.  The  3x3  masks  were  then  used  to  generate  the 
textural  planes  and  from  them  the  texture  energy  measures  were  obtained.  Two  of  the  sections  were 
images  of  water,  one  section  was  a  grassy  field  and  the  other  an  uneven  area  of  dirt  &  vegetation. 
The  results  of  this  texture  analysis  are  tabulated  in  Table  4-2.  The  two  sections  of  water  had,  as 
expected,  very  similar  energy  measures.  The  measures  for  the  section  with  the  grassy  field  and  the 
section  with  the  dirt  both  differ  significantly  from  the  water-containing  sections  but  differ  from  each 
other  (by  >  10)  only  in  the  e3e3,  e3l3,  and  s3l3  masks.  However,  using  just  these  three  masks,  they 
can  be  soparated. 

Results  here  show  that  texture  energy  measures  can  be  applied  to  ueriai  images  which  have  sizable 
textured  regions.  Information  gained  from  macroscopic  texture  analysis  can  aid  in  understanding 
changes  in  land  usage  and  local  scene  analysis. 

The  primary  objective  of  te.xture  classification  is  to  identify  textured  regions  within  the  image.  If  it  is 
known  where  the  texture  boundaries  are.  classification  of  the  different  regions  can  be  done  with 
relative  ease.  In  such  a  segmented  image,  the  task  is  one  of  determining  a  number  of  texture 
measures  for  the  large  areas  and  then  classifying  those  areas  into  one  of  several  known  classes. 
However,  when  no  a  pr/on  .  information  is  known  about  the  texture  boundaries,  the  task  becomes 
harder.  Texture  measures  must  be  calculated  for  each  pixel  and  the  pixels  then  classified.  The 
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Tabla  4- 1 :  Texture  Measures  for  the  Simulated  Image 
challenge  here  is  to  develop  an  algorithm  that  will  accurately  classify  pixels  without  incurring  a  great 
computational  cost.  This  involves  find  ug  the  smallest  possible  feature  vector  that  will  offer 


Figure  4*12:  Aerial  Image  used  for  Analysis  with  Texture  Energy  Masl^s 
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Table  4-2:  Texture  £-e'g>  Measures  for  Real  Images 
reasonably  good  int^r  class  separation  a 5  oed  below,  the  MRl  operator  was  found  to  ba  very 
effective  m  classifying  textures  p'*cl  by  pi>  o' 


Training 

The  [Brodau  68)  texture  album  contains  photographs  of  many  textures.  Four  stucturally  similar 
"cellular"  textures,  aluminum  wire,  cotton  canvas,  raffia  and  oriental  str.aw  cloth  were  chosen  for  this 
work.  These  textures  were  chosen  because  of  their  similarity  in  cell  size  and  cell  shape.  They  may  be 
considered  to  form  a  "worst  case"  set  in  the  sense  that  they  are  very  similar  to  each  other  and 
therefore  niay  be  hard  to  tell  apart. 

Fig  4-13  shows  the  composite  image  that  was  used  as  the  training  image.  128x128  samples  each  of 
the  four  textures  were  used  for  this  purpose.  Each  sample  was  individually  histogram  equalized  to 
eliminate  first  order  differences.  This  ensures  that  differences  in  brightness  between  samples  after 
processing  with  the  MRI  operator  are  caused  by  the  convolution  operation  and  are  not  due  to 
differences  in  the  initial  brightnesses  of  the  samples.  Training  involved  the  use  of  28  different  MRI 
operators  and  each  time  the  following  process  was  carried  out: 

1 .  convolution  of  the  composite  image  with  the  operator  in  the  frequency  domain  and 
conversion  of  the  result  back  into  a  complex  image  in  the  spatial  domain. 

2.  conversion  of  the  complex  image  into  a  normalized  8-bit  integer  image  giving  the  gray- 
level  magnitude  for  each  pixel. 

3.  an  1 1  x1 1  average  smoothing  of  the  3-bit  integer  image 

4.  calculating  average  gray-level  intensity  and  standard  deviation  measures  for  100x100 
internal  regions  of  each  of  the  four  samples.  The  internal  regions  were  used  rather  than 
the  whole  128x128  section  to  avoid  including  edge  pixels  in  the  average  and  standard 
deviation  measures. 

The  results  of  performing  this  set  of  steps  with  each  operator  is  summarized  in  Table  4-3. 

This  was  followed  by  determining  which  masks  gave  the  greatest  between-class  separations.  To  do 
this,  the  "inter-class  ratio"  was  determined  tor  each  pair  of  classes  for  ever/  mask.  Briefly  stated,  the 
inter-class  ratio  is  the  difference  in  averages  between  two  classes  divided  by  the  sum  of  their 
standard  deviations  at  a  particular  mask.  For  mask  m  and  any  two  classes  i  and the  inter-class  ratio 

is  defined  as: 

l^m  _  l.-fvgf/n  i)- Avg{rtiJ)\ 

The  inter-class  ratios  calculated  from  the  entries  in  Table  4-3  are  given  in  Table  4-4. 
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Table  4*3:  average  &  std  dev  for  each  class  for  each  MRI  operator 
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For  classification,  it  is  important  to  use  masks  that  give  the  greatest  mter-class  separation.  It  is  also 
important  to  limit  the  number  of  masks  used  in  the  interest  of  minimizing  the  amount  of  computation 
required.  Therefore,  for  classification  purposes,  it  is  best  to  choose  that  minimum  set  of  MRI  masks 
that  will  give  us  maximum  separation  between  any  pair  of  classes.  From  Table  4-4  it  is  evident  that 
one  of  the  masks:  n2sl,  n2s3  or  n3s6  will  give  the  maximum  inter-class  separation  between  any  pair 
of  the  four  textures.  Therefore,  only  these  three  masks  need  to  be  used  for  classification  of  textures 
known  to  belong  to  this  set  of  four. 
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Classification  Results 

A  second  composite  image  similar  to  the  first  was  created.  Different  portions  of  the  same 
photographs  used  in  the  training  image  were  use<l  to  make  this  second  image.  Each  of  the  three 
masks  was  convolved  with  the  composite  image,  creating  three  feature  planes.  The  gray-level  values 
provided  by  these  planes  at  each  pixel  served  as  a  feature  vector  of  length  three  which  was  then  used 
to  classify  that  pixel.  A  minimum  distance  classifier,  using  the  averages  found  for  each  class  and  for 
each  mask  from  the  training  image,  was  employed  to  perform  the  classification.  Results  show  that  95 
?b  of  pixels  in  the  interior  regions  of  the  different  sections  can  be  classified  accurately  while  in  the 
entire  composite  image  88  %  of  the  pixels  are  correctly  classified.  Fig.  4-i4  shows  the  composite 
image  used  for  classification  and  4- is  shows  the  resulting  pixel-by- pixel  segmentation. 

Tables  4-5  &  4-6  give  a  detailed  evaluation  of  how  the  classifier  worked  including  numbers  and 
percentages  of  correctly  classified  and  mis-classified  pixels  in  both  internal  regions  and  in  the 
composite  image. 
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Table  4*5:  Classification  Accuracy  of  100  x  100  Interior  regions 

Overall  accuracy  95  % 
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Table  4-6:  Classifies:  Accuracy  of  Entire  Composite  Image 

Overall  accuracy  87% 
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Figure  4- 1 3 


3 :  Texture  samples  used  for  training 
top  left;  straw  cloth,  top  right:  raffia 
bot  left:  cotton  canvas,  bot  right;  aluminum  wire 


§ 


4.6  Structural  Analysis  Using  MRI  Operators 

The  MRI  operator  Is  also  useful  for  the  early  processing  of  structural  images.  Both  the  order  and 
standard  deviation  of  the  operator  can  be  tuned  to  different  image  features.  A  single  correlation 
between  image  and  complex  MRI  kernel  determines  both  the  strength  of  the  feature  and  it's 
orientation  (for  higher  than  zero  order  operators).  The  following  four  figures  illustrate  different 
kernels  applied  to  the  same  input  image  of  an  aircraft  on  a  runway.  Notice  for  the  first  order 
operators,  a  standard  deviation  of  six  effectively  traces  most  of  the  aircraft  outline.  Once  the  aircraft 
IS  located  operators  with  a  lower  standard  deviation  help  to  locate  features  such  as  the  engines. 
Region  operators,  with  a  second  order  kernel,  help  to  locate  the  center  line  of  the  airframe  and  engine 
cowlings  respectively. 


Figure  4>16:  Result  of  applying  the  MRI  edge  detecting 

operator  {N  ■  1]  with  a  standard  deviation 
of  two  pixels.  [A]  is  the  original  image. 
[B]  is  the  magnitude  of  the  transform,  and 
[C]  the  transform  phase. 
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Figu  rt  4- 1 7 :  Result  of  applying  the  MRI  edge  detecting 

operator  [N  >  1  ]  with  a  standard  deviation 


of  six  pixels.  [A]  is  the  original  image, 
[B]  is  the  magnitude  of  the  transform,  and 
[C]  the  transform  phase. 
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Figure  4*18:  Result  of  applying  the  MRI  region  detecting 

operator  [N  ■  2]  with  a  standard  deviation 
of  three  pixels.  [A]  is  the  original  image, 
[Bj  is  the  magnitude  of  the  transform,  and 
[Cl  the  transform  phase. 
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4.7  Recursive  Model  Matching  Algorithms 

Recursive  model  matching  allows  a  broad  range  of  feature  extractors  to  be  interactively  applied  to 
an  incoming  image  under  guidance  from  a  library  of  hierarchically  structured  models  At  each  step  m 
the  recursive  analysis  of  a  frame,  an  ensemble  of  hypotheses  are  active  suggesting  obiects  and 
textures  present  m  the  image.  Evaluation  of  each  model  based  hypothesis  will  suggest  additional, 
specific  features  which  might  be  extracted  from  the  image  in  order  to  lend  support  or  disprove  the 
hypothesis.  As  additional  features  which  support  a  particular  hypothesis  become  known  to  the 
evaluation  process,  more  narrowly  defined  models  may  be  invoked  as  new  hypotheses  Ultimately 
each  high  level  hypothesis  must  be  resolved  down  to  either  highly  probable  termm.tl  models  or 


determinations  of  an  unfounded  hypothesis. 


The  strengths  of  both  electro- optic  and  digital  multiprocessor  technology  are  symbiotically  paired 
by  the  recursive  model  matching  structure.  Processors  such  as  a  real-time,  optical  correlator 
[Casasent  78]  allow  a. reference  kernel  function  to  be  applied  in  parallel  to  an  entire  grey  level  image, 
limited  only  by  tlie  rate  at  which  data  can  be  digitally  scanned  in  and  out  of  the  device.  Decision 
intensive  steps,  in  which  the  detected  features  are  evaluated  in  the  context  of  specific  object  models, 
map  well  onto  a  digital  multiple  instruction  stream,  multiple  data  stream  processor.  Associated  work 
in  computer  architecture  (RAPID3US  &  RAPIOGRAPH)  is  providing  a  basis  for  the  simulation  and 
implementation,  respectively,  of  such  a  tightly  coupled  system. 


4.7.1  Feature  Space 

Evaluation  of  actual  high  altitude  or  space  based  imagery,  such  as  that  shown  in  figure  4-20, 
underscores  the  wealth  of  different  kinds  of  features  v/hich  need  to  be  identified  and  cataloged  in 
order  to  match  existing  model  data.  Througn  the  integration  of  data  from  diverse  features,  ambiguity 
caused  by  lighting,  partial  object  occlusion,  and  sampling  noise  can  frequently  be  resolved  Several 
rlifferent  classes  of  feature  extractors  have  been  identified  as  being  potentially  useful; 

•  The  multiresolution,  rotationaiiy  invariant  operator,  desnnbed  earimr  m  this  report.  a.isists 
in  the  detection  of  points,  edges,  and  imes  withm  a  single  brightness  or  spectral  plane. 
Evaluation  of  an  image  by  the  MRI  operator  yields  a  list  of  candidate  features  described 
by  an  x.y  location  and  maigmtude  for  n  *0  at  each  of  k  resolution  levels.  For  higher 
order  operators  (n  ■  1,  n  ■  2),  a  direction  is  assigned  to  each  feature  point. 

•  Texture  energy  measures  provide  an  useful  means  of  identifying,  characterizing,  and 
segmenting  ''background"  region^  -  set  of  N  Kernel  functions  are  used  as  a  basis  set, 
characterizing  a  sample  texture  in  •-.?  N  dimensional  feature  space  oy  the  "nearest" 
reference  texture  point.  Eouroar  of  a  particular  texture  region  can  pe  detected 
through  gradients  in  the  texture  ei'e'g .  measure  assifjned  to  local  windows. 
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•  Boundaries  not  readily  evident  as  a  change  in  brightness  are  often  visible  as  a  change  in 
the  reflected  epeetrum.  Adding  an  additional  dimension,  spectral  frequency,  often 
samole<)  by  discrete  channels  such  as  R(x.y).  G(x.Y).  and  B(x,y),  can  provide  information 
on  the  magnitude  and  direction  of  spectral  "edges"  or  "regions".  As  models  generate 
increasingly  detailed  questions  about  the  image,  reporting  criteria  can  be  narrowed  to 
inquiry  about  shifts  from  a  specific  spectral  reference,  or  m  specific  image  regions. 

•  Temporal  changes  between  frames,  or  motion  features,  can  be  extracted  using  a  variety 
of  techniques.  We  are  interested  in  exploring  tlie  extension  of  gaussian  operators  to  the 
temporal  dimension  in  analogy  to  spectral  and  temporal  operators.  Temporal  information 
assists  in  both  object/background  separation  and  object  identification  ^ 

•  Macro- operators,  suggested  by  detailed  object  models,  may  combine  several  of  the 
above  possibilities.  Questions  about  the  angle  at  which  a  wmg  meets  the  aircraft  body 
may  be  resolved  by  an  operator  tuned  to  a  particular  angle  described  in  terms  of  the 
soectral/textural  properties  of  the  aircraft  and  background.  As  very  narrow  hypotheses 
are  formed,  features  searches  may  be  required  which  could  not  practically  be  anticipated 
a  priori. 

The  computational  expense  of  many  interesting  operators,  and  the  number  of  possible  parameter 
combinations  suggests  a  feature  extraction  task  ideally  suited  to  an  electro-optic  processor.  Yet  even 
assuming  for  a  moment  that  each  feature  possibility  could  be  extracted  instantaneously,  any  digital 
representation  of  the  complete  feature  space  would  result  in  tremendous  organizational  and  storage 
difficulties  in  an  effort  to  make  the  features  available  in  usable  form,  when  the  number  of  possible 
operators,  combinations  Of  parameters,  and  substantial  kernel  sizes  are  taken  into  account,  the  scan- 
in/scan-out  limitations  of  foreseeable  devices  makes  a  priori  computation  of  all  potentially  interesting 
features  unappealing.  Recursive  evaluation  of  the  feature  space  provides  an  alternative  to  a  pnon 
pruning  of  the  feature  space. 

4.7.1 .1  Recursive,  Goal  Driven  Image  Exploration 
Recursive,  goal  driven  image  exploration  allows  a  developing  description  of  the  visual  environment, 
described  in  terms  of  an  object  and  te.rture  data  base,  to  select  the  particular  --egiors  of  the  feature 
space  which  are  evaluated  in  an  interactive  fashion,  Oeoonding  on  the  diversity  of  obtects  in  the 
image,  and  the  amount  of  prior  knowledge  coming  from  other  frames,  several  hundred  cycles  may  be 
recuired  to  converge  on  an  adequate  frame  analysis.  Each  cycle  consists  of  a  feature  extraction 
operation  pipelined  into  a  digital  analysis  by  a  multitude  of  processes,  each  in  vestigating  a  particular 
hypothesis  somewhere  in  the  image. 
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Figure  4-21 :  Recursive  image  analysis  system  using  texture  and  structure. 

The  proposed  iterative  structure  is  shovYn  m  figure  4-2i  such  that  structure  and  texture  analysis  are 
unfolded  into  separate  iterative  loops  sharing  the  same  image  frame  descriptors  (center  of  diagram). 
In  an  actual  implementation,  these  might  be  folded  together  into  common  hardware  and  software 
mechanisms. 

In  the  laboratory,  incoming  images  are  received  already  sampled  in  space  (x,y),  spectral  frequency 
(X),  and  time  (t).  Additional  camera  parameters  describing  camera  altitude,  angle,  cloud  cover, 
latitude,  and  longitude  are  supplied  to  the  internal  image  frame  descriptors. 

The  proposed  electro-optic  feature  extractors,  simulated  in  the  laboratory  by  array  processors, 
provide  the  only  access  the  system  has  to  image  pixels.  Early  boot  processes  are  used  to  initiate 
extraction  of  simple  edge  and  region  information  in  order  to  initiate  the  formation  of  image 
hypotheses.  Texture  analysis  has  an  additional  segmentation  step  in  v/hich  the  extracted  texture 
features  are  clustered  into  proposed  regions  i^gam  iteratively. 

Both  structural  and  textural  infcrma:  on  s  summarized  within  image  frame  descriptors.  These 
active  processes  maintain  internal  'eatwf?  cprosentations  in  order  to  service  questions  from 
cooperative  hypothesis  evalur.tio''  c':':'*'. "’t  may  be  indexed  by  one  of  numerous  regional  or 

feature  space  keys. 
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The  dynamic  pool  of  hypothesis  evaluation  processes  are  built  upon  specific  models  pulled  from  a 
hierarchical  obiecl  and  texture  data  base.  Valid  scheduling  requests  inside  the  pool  include 
activation  of  processes  for  the  same  model  at  different  locations  within  the  image  (high  level), 
spawning  of  subprocesses  investigating  specific  possibilities  down  the  obiect  model  hierarchy  (lower 
level),  or  termination  of  hypotheses  which  cannot  be  supported  relative  to  competing  hypotheses  for 
the  same  structure.  All  communication  between  processes  investigating  hypotheses  occurs  through 
the  image  frame  descriptor  processes  or  the  process  scheduler. 

Questions  which  arise  in  the  course  of  trying  to  support  or  deny  a  hypothesis  are  collected, 
condensed,  and  format  prior  to  triggering  feature  evaluation.  Requests  from  numerous  evaluation 
processes  must  be  condensed  into  a  serial  stream  of  feature  requests  such  that  the  expectation  value 
for  feature  evaluation  is.  perhaps  suboptimally,  minimized.  The  proper  feature  operator  and 
parameters  must  then  be  prepared  prior  to  the  queuing  of  an  evaluation  request. 

Two  important  functions  are  intentionally  not  shown  in  the  diagrams.  Data  base  information  must 
somehow  be  acquired  by  the  system,  either  through  structured  learning  or  direct  data  entry.  Image 
analysis  reports  mu®  be  generated  to  provide  system  output  based  on  the  image  frame  descriptors. 
The  reporting  system  may  include  filters  to  forward  very  limited  kinds  of  data.  Central  research  issues 
tied  to  recursive  matching  can  be  explored  without  these  functions,  directing  limited  manpower 
resources  to  tasks  where  basic  research  issues  can  be  addressed. 

4.7.1 .2  An  Example 

The  recursive  matching  structure  can  be  illustrated  by  a  simple  example  describing  the  detection 
and  identification  of  a  parked  aircraft,  as  illustrated  in  figure  4-22  through  4-25  Initial  feature 
extractors,  such  as  low  frequency  MRI  operators  will  help  to  locate  candidate  regions  of  interest. 
Numerous  high  level  obiect  hypotheses  may  arise  from  these  operators.  Although  this  example 
describes  a  structural  analysis>  similar  ooerations  might  be  used  to  describe  a  texture  region. 

Shown  in  figure  4-22,  high  level  analysis,  perhaps  corresponding  to  MRI  levels  k  «  9,  and  k  >  10 
respectively  identify  three  candidate  regions  that  might  describe  an  aircraft,  truck,  building  or  other 
object.  Each  high  level  hypotheses  will  m  turn  activate  a  process  built  around  the  appropriate  high 
level  object  model.  Object  model  evaluation  by  “'>rh  process  will  result  m  numerous  questions  which 
help  to  support  or  deny  the  hypothesis.  Requests'  oy  one  evaluation  process  can  be- expected  to 
provide  clues  to  other  hypothesis  evaluation  processes  since  each  kernel  function  is  run  on  the  entire 
image. 
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Figure  4-24:  For  instance,  the  estimated  wing  angle  lends  support  to 

aircraft  class  3. 

result  in  the  class  two  process  deactivating,  and  the  class  three  process  spawning  processes 
exploring  the  possibility  of  a  DC-8  or  L-1011  aircraft  based  on  engine  cowling  location  and  wing 
features.  Shown  in  figure  4-25,  this  hierarchical  processing  would  continue  until  reporting  terminals 
were  reached. 
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Figure  4-25:  Within  the  process  invectigaiing  Uie  class  tiiree  liypulhesis, 
search  for  the  engine  and  wing  length  estimates  triggers 
investigation  of  the  possibility  that  the  aircraft  may  be  a  DC-iO  or 
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5.  liviAGH  U^!DER3TA^!DI^M•3  TECHNIQUES  FOR  3D 
SCENE  iTifERFRETATION 

5.1  INTRODUCTION 

In  this  chapter,  we  present  results  m  two  aspects  of  the  30  change  cl  itection  task:  the  lov/  level 
problem  of  analytung  images,  and  the  high-level  problem  of  representing,  constructing  and  updating 
the  3D  scene  model.  For  the  low-level  processing  we  dr'senbe  a  nev;  method  of  computing  the 
i^.tereo  correspondences  which  can  be  used  to  determine  the  3D  positions  of  points  from  a  pair  of 
aenal  images-  for  tl-,.j  high-level  ivocossmri.  wo  describe  methods  of  rec’esenting  and  cncoiructing 
sc.jne  models  from  multiple  views,  using  ranrjefmrler  data.  The  use  of  rangefinder  data  allo'./s  us  to 
decouple  the  high-level  processing  problem  from  the  low-level  problem,  for  more  efficient  research 
into  the  high-level  problems. 


5.2  STEREO  BY  TWO-LEVEL  DYNAMIC  PROGRAMMING 


5.2.1  Introduction  - 

Stereo  is  a  u-seful  method  of  obtaining  depth  information.  The  key  problem  in  stereo  is  a  search 
procl.^m  which  finds  the  correspondence  points  between  the  left  and  rignt  images,  so  that,  given  the 
camcia  model  (ie.,  the  relationship  between  the  right  and  left  cameras  of  the  stereo  pair),  the  depth 
can  be  computed  by  triangulation.  In  edge  based  stereo  techniques,  edges  in  the  images  are  used  as 
the  elements  whose  correspondences  to  be  found  [Grimson  and  Marr  79,  Baker  and  Einford 
81 ,  Baker  82,  Barnard  and  Fischler  82].  Even  though  a  general  problem  of  finding  correspondences 
bet'.veen  images  involves  the  search  within  the  whole  image,  the  knowledge  of  the  c.’vmera  model 
simplifies  this  imageto-image  correspondence  .oroblem  into  a  set  of  scanlim^-to-scanline 
correspondence  oroblems.  That  is.  once  a  pair  of  stereo  images  is  rectified  so  that  the  epipolar  lines 
are  horizontal  scanlines,  a  pair  of  corresponding  edges  in  the  right  and  left  images  should  be 
searched  for  only  within  the  same  horizontal  scaniings-  We  call  this  search  intra-zcanlino  search. 
This  iiitra-scanline  search  can  be  treated  as  the  problem  of  finding  a  matching  path  on  a  two- 
dimensional  (PD)  search  plane  whose  veriic.ol  and  horizontal  axe-S  arc  the  right  and  left  soanlines.  A 
dynamic  programming  technique  can  handle  this  search  effici-'iitiy  [Baker  and  Binfprd  81 ,  Baker  82], 

Flowever.  if  there  is  an  edge  extending  across  scanlines,  the  correspondences  in  one  scanline  have 
strong  dependency  on  the  correspondences  ^n  ’he  neighboring  scanlines,  because  if  two  points  are 
on  a  vertically  connected  edge  m  the  ie<t  .  n:  oorrf'spondmg  ppints  .thoiild  most  likely,  lie  on 


a  vertically  connected  cckje  m  the  right  image.  The  mtra  scanlme  cearch  alone  cloea  not  take  into 
account  this  mutual  Hop’^^ulency  between  scanlmes.  Therefore,  another  search  is  necessary  v;hich 
tries  to  linci  the  consistency  among  the  scanlmes.  winch  we  call  mto'  scanime  search. 

By  considering  both  mtra-  and  inter-scanline  searches,  the  correspondence  problem  n  stereo  can 
be  cast  as  that  of  finding  m  a  three-dimensional  (3D)  search  space  an  optimal  matching  surface  that 
mo.st  satisfies  the  mtra-scanline  matches  and  inter-scanline  consistency  Here,  a  matching  surface  is 
defined  by  staciung  2D  matching  paths,  where  the  2D  matching  paths  are  found  m  a  2D  search  plane 
whose  av-vs  an?  left  image  column  position  and  nght-image  column  position,  and  tiu?  stacking  is  clone 
in  tho  direction  of  the  row  (scanlino)  number  of  the  images.  The  cost  of  ttie  matclung  surface  is 
definerJ  as  the  sum  of  the  costs  of  the  intra-scanlme  matches  on  the  2D  search  planes,  while  vertically 
connected  edges  provide  the  consistency  constraint  across  the  2D  search  planes  and  thus  penalize 
those  intra-scanline  matches  which  are  not  consistent  across  the  scanlines.  Our  stereo  matching 
uses  dynamic  programming  for  performing  both  the  intra-scanline  and  the  inter-scanline  searches, 
and  both  searches  proceed  simultaneously.  This  method  reduces  the  computation  to  a  feasible 
amount. 

S.2.2  Use  of  (nlef'scanline  Constraints 

As  mentioned  above,  for  a  pair  of  rectified  stereo  images,  matching  edges  within  the  same  scanline 
(ie.,  the  mtra-scanline  search)  should  be  sufficient  in  principle.  However,  in  practice,  there  is  much 
ambiguity  in  finding  correspondences  solely  by  the  intra-scanline  search.  To  resolve  the  ambiguity, 
we  can  exploit  the  consistency  constraints  that  vertically  connected  edges  across  the  scanlines 
provide  .  Suppose  a  point  on  a  connected  edge  u  in  the  right  image  matches  with  a  point  on  a 
connected  edge  v  in  the  left  image  on  scanline  t.  Then,  other  points  on  these  edges  should  also 
match  on  other  scanlines.  If  edges  u  and  v  do  not  match  on  scanline  i,  they  should  not  match  on 
other  scanlines,  either.  We  call  this  property  inter-scanline  consistency  constraint.  Thus,  our 
problem  is  to  search  for  a  set  of  matching  paths  which  gives  the  optimal  correspondence  of  edges 
within  scanlines  under  the  inter-scanline  consistency  constraint. 

A  few  methods  have  been  used  to  com.bme  the  inter-scanline  search  with  the  intra-scanline  search. 
Henderson  [Henderson,  et  al.  79)  sequentially  processed  each  pair  of  scanlmes  and  used  the  result  of 
one  scanline  to  guide  the  search  in  the  next  scanlme.  However,  this  method  suffers  in  that  the  errors 
made  m  tho  earlier  scanlines  significantly  alfert  the  total  results. 


D.ik^^r  [Baker  82)  first  processed  each  pair  of  ccaniines  indcoendcntly  After  all  the  mtra  ocanlme 
matchiny  was  done,  he  used  a  cooperative  process  to  delect  and  correct  the  matcluncj  results  whicn 
violate  the  consistency  constraints  Since  this  method,  however,  does  not  use  the  inter  scanlme 
constraints  directly  m  the  search,  the  result  from  the  cooperative  process  is  not  guaranteed  to  be 
optinial  Baker  suggested  the  necessity  of  a  search  winch  finds  an  optimal  lesult  satisfying  the 
consii.t^.ncy  constraints  m  a  3D  search  space,  but  a  feasible  method  was  left  as  an  open  problem. 

A  sti  aighlforv/ard  .vay  to  achieve  a  matching  winch  satisfies  the  mter  scanlme  constraints  is  to 
cjn.-(e'  all  mati'.iiinrjs  between  connected  eriges  m  th.e  right  and  left  images,  hiowever  since  the 
t/pic.il  number  of  c.mnocted  jdges  is  -i  few  to  seveml  hundred  m  each  miagi;.  this  brute  force  method 
IS  usually  infeasible. 


dynaini  c 
prognaniining 

t) 


dynamic 
pi*Ogi'a."::::i  ng 


Figu re  5- 1 :  Two  searches  involved  in  stereo  matching 

We  propose  to  use  dynamic  programming,  which  is  used  for  the  intra  scanline  search,  also  for  the 
intcr  scanline  search.  These  two  searches  are  combined  as  shown  m  figure  5-1.  One  is  for  the 
correspondence  of  all  connected  edges  in  the  right  and  left  images,  and  the  other  is  for  the 
correspondence  of  edges  (actually,  intervals  delimited  by  edges)  on  right  and  left  scanlines  under  the 
constraint  given  by  the  former.  The  scheme  to  use  dynamic  programming  in  two  levels  was  first 
employed  in  the  recognition  of  connected  spoked  words  [Sakoe  79],  They  used  one  search  for  the 
possible  segmentation  at  word  boundaries  and  the  other  for  the  time-warping  .vord  matching  under 
the  constraint  given  by  the  former  In  connected  word  recognition,  hov/ever,  the  pattern  to  be 
processed  is  a  single  iO  vector.  In  our  case  a  connected  edge  crosses  over  multiple  scanlincs  (ie., 
ID  vectors).  This  means  that  we  need  a  3D  search  space  which  r.  a  stack  of  2D  search  pl  inos  for 
intra-scanlino  matching. 


Dynamic  proyramminy  [Aho.  l-lopcroft  and  Ullman  74]  solves  an  \  stage  decision  process  as  V 
sinyle  stayo  processes  This  reduces  the  computational  complexity  to  the  loyanthm  of  the  original 
combinatorial  one.  In  order  to  apply  dynamic  programming,  however,  the  oiigmal  decision  process 
must  satisfy  ifie  following  two  requirements.  First,  the  decision  stages  must  be  ordered  so  that  all  the 
stages  whose  results  are  needed  at  a  given  stage  have  been  proccs.'-.ed  before  then.  Second,  the 
decision  process  should  be  Markovian-  that  is.  at  any  stage  the  behavior  of  the  process  depends 
solely  on  the  current  state  and  does  not  depend  on  the  previous  history.  It  is  not  ubvious  whether 
these  properties  exist  m  the  problem  of  finding  correspondences  between  connecterl  edges  in  stereo 
images,  but  we  clarify  them  m  the  following  sections. 

5.2.3  Correspondence  Search  Using  Dynamic  Programming 
5.2.3. 1  Intra-scanline  search  on  20  plane 

The  problem  of  obtaining  correspondences  between  edges  on  the  right  and  left  epipoiar  scanlines 
can  be  solved  as  a  path  finding  problem  on  a  20  plane  Figure  5-2  illustrates  this  2D  search  plane. 
The  vertical  lines  show  the  positions  of  edges  on  the  left  scanline  and  the  horizontal  ones  show  those 
on  the  right  scanline-'  We  refer  to  the  intersections  of  those  lines  as  nodes.  Nodes  in  this  plane 
correspond  to  the  stages  in  dynamic  programming  where  a  decision  should  be  made  to  select  an 
optimal  path  to  that  node.  In  the  intra-scanline  search,  the  stages  must  be  ordered  as  follows:  When 
we  examine  (he  correspondence  of  two  edges,  one  on  the  right  and  one  on  the  left  scanline,  the 
edges  which  are  on  the  left  of  these  edges  an  aach  scanline  must  already  be  processed.  For  this 
purpose,  we  give  indices  for  edges  in  left-to-right  order  on  each  scaniine:  (0:.Uj  on  the  right  and  [0:.V] 
on  the  left.  Both  ends  of  a  scanline  are  also  treated  as  edges  for  convenience.  It  is  obvious  that  the 
condition  above  is  satisfied  if  we  process  the  nodes  with  smaller  indices  first.  Legal  paths  which  must 
be  considered  are  sequences  of  straight  line  segments  from  node  (0.0)  at  the  upper  left  corner  to 
node  (.U,W)  at  the  lower  right  corner  on  a  2D  array  [0:.1/.0:,VJ.  They  must  go  from  the  upper  left  to  the 
lower  right  corners  monotonically  due  to  the  above-mentioned  concition  on  ordering.  This  is 
equivalent  lo  the  non-reversal  consti  airii  m  edge  correspondence:  that  is.  the  order  of  matched 
edges  has  to  be  preserved  in  the  right  and  left  scanlines.  This  constraint  excludes  from  analysis  thin 
objects  such  as  wires  and  poles  which  may  result  in  positional  reversals  m  the  image.  A  path  has  a 
vertex  at  node  m=(m.n)  when  right  edge  m  and  left  edge  n  are  matched. 

The  cost  of  a  path  is  defined  as  follows  ^et  Dmt  k)  be  the  minimal  cost  of  the  partial  path  from 
node  ft  to  node  m.  We  denote /)(rn  fri  as  -m'  .-^nen  k:s(0.0).  /Jfm)  is  the  cost  of  the  optimal  path  to 
node  m  from  the  origin  (IIU).  The  cost  -J  i  r.-"'-  z  the  '.um  of  those  of  it,',  primitive  paths.  A  primitive 
path  is  a  paitial  path  which  contains  no  :  jc  j."d  it  is  represented  by  a  straight  ime  segment  as 
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Figure  5*2:  20  search  plane  for  intra  scanline  search. 

Intensity  prolilcs  are  shown  along  each  axis. 

The  horizontal  axis  coirosponcts  to  the  left  scar.  ’^e  and  the  vertical  one 
corresponds  to  the  right  scanline.  Vertical  and  horizontal  lines  are  the  edge 
positions  and  path  selection  is  done  at  their  intersections. 

shown  in  figure  5-2.  It  should  be  noted  that  a  primitive  path  actually  corresponds  to  matching  the 
intervals  delimited  by  edges  at  the  start  and  end  nodes  rather  than  edges  themselves.  Let  dim. k)  be 
the  cost  of  the  primitive  path  from  node  kto  node  m.  Obviously,  dim.k)  >  D(m.k)  and  on  an  optimal 
path  (/(m.k)  =  Dim.k). 

Now.  Dim.k)  can  be  defined  recursively  as: 

Dim.k)  =  min  [di m.m~  i )  ~  D[m  —  i.k)} 

in 

D{k.k)=Q  (5.1) 

where  m  =  im.n).  ;  =  »  =  (i.y), 

0<j<';-  -  *0. 

Vector  /  represents  a  primitive  pc  ■  ;  ■  -  :c  "I'c'e  m  \A/hen  ;  =  0,  the  primitive  path  is  horizontal,  as 
shown  at  (a)  in  figure  5-2.  It  cc”  ';  :s  ’p  '"e  case  in  which  a  visible  part  in  the  left  imago  is 
occluiled  in  the  right  image,  .v:  c  -•  "  niive  path  is  vertical,  as  shown  at  (b).  When  />1 
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ancl/or  /  >  1.  the  prmiitive  path  skips  or  njiujres  beyond  -  1  and/or  /  -  !  edges  on  the  righ:  and/or  loft 
scanlines  as  sliown  aMc)<«i  the  figure  Such  a  path  corresponds  to  the  cane  where  some  edges  have 
no  corresponding  ones  on  tlie  other  scanime  because  of  noise. 

The  oath  with  cost  D{M.O)  gives  the  optimal  correspondence  between  a  pair  of  scanlines. 

5.2.4  Inter  scanline  searcli  in  30  space 

Tho  prcL'lom  of  obtaining  a  correspondence  between  edges  under  the  inter- scanime  consistency 
constraintr.  can  be  viewed  as  the  problem  of  finding  a  sot  of  paths  in  a  3D  apace  which  is  .i  stack  of  2D 
planes  for  mtra  scdnlino  nearch  Figure  5-3  illustrates  this  2D  space  Tho  sioe  laces  of  this  space 
correspond  to  the  right  and  'eft  images  of  a  stereo  pair.  The  cost  of  a  set  of  paths  is  defined  as  tlie 
sum  of  tfie  costs  of  the  individual  paths  m  the  set.  We  want  to  obtain  an  optimal  (ie..  the  miminal  cost) 
set  of  paths  satisfying  the  mter-scanline  constraints.  A  pair  of  connected  edges  m  the  right  and  left 
images  make  a  set  of  20  nodes  in  the  30  space  when  they  share  scanline  pairs.  We  refer  to  this  set  of 
2D  nodes  as  a  single  2D  node.  The  optimal  path  on  a  20  plane  is  obtained  by  iterating  tho  selection  of 
an  optimal  path  at  each  20  node.  Similarly,  the  optimal  set  of  paths  in  a  30  space  is  obtained  by 
iterating  the  selection  of  an  optimal  set  of  paths  at  each  3D  node.  Connected  edges,  3D  nodes,  and 
sets  of  paths  between  30  nodes  are  illustrated  in  figure  5-3. 

As  described  in  section  5.2.2,  the  decision  stages  must  be  ordered  in  dynamic  programming.  In  the 
intra-scanline  search,  their  ordering  was  straightforward;  it  was  done  by  ordering  edges  from  left  to 
right  on  each  scanline.  A  similar  consideration  must  be  given  to  the  inter  scanline  searcli  in  30  space 
where  the  decision  stages  are  the  30  nodes.  A  30  node  is  actually  a  set  of  20  nodes,  and  the  cost  at 
a  30  node  is  computed  based  on  the  cost  obtained  by  the  intra-scaniine  search  on  each  2D  search 
plane.  This  leads  to  the  following  condition:  When  we  examine  the  correspondence  of  two 
connected  edges,  one  in  the  right  and  one  in  the  left  image,  the  connected  edges  which  are  on  the 
left  of  these  connected  edges  in  each  imago  must  already  be  processed. 

A  connected  edge  is  said  to  be  on  the  left  of  u„  if  all  the  edges  in  u,  on  the  scanlines  which  and 
Wj  share  are  on  the  left  of  those  in  w,.  Tho  "left  of"  relationship  is  transitive;  if  there  is  a  connected 
edge  and  u  is  on  the  left  of  and  is  on  the  left  of  then  is  on  the  left  of  (if  and  share 
any  scanlines).  The  order  of  hvo  connected  edges  which  do  not  satisfy  both  of  these  relations  may  be 
arbitrarily  specified.  We  assign  an  ordering  index  from  left  to  right  for  every  connected  edge  in  an 
image.  This  ordering  is  possible  without  contradiction  when  a  connected  edge  never  crosses  a 
scanline  more  than  once  and  when  two  connected  edges  never  intcrsnr.t  each  other.  Our  edge- 
linking  process  which  will  bo  explained  n  .-c:.on  4  is  devised  so  that  it  docs  not  produce  oucli  cases. 
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Figure  5*3:  30  search  space  for  intra-  and  inter-scanline  search. 

This  may  be  viewed  as  a  rectangular  solid  seen  from  above. 

The  side  laces  conesoond  to  the  right  and  left  sfereo  images.  Connected 
edges  in  each  image  form  sets  of  intersections  (nodes)  in  this  space.  Each  set 
Is  called  a  3D  node.  Selection  of  a  set  of  paths  Is  done  at  every  3D  node. 


Now  we  will  present  liow  the  cost  of  a  3D  path  is  defined.  Suppose  we  assifjn  indices  f  1  to 
connected  otlges  m  the  nght  image,  and  |0  I  |  m  the  left.  The  left  and  right  ends  of  an  imi.»ge  are 
treated  as  connected  edges  for  convenience;  the  left  ends  are  assigned  index  O  s  Let  u-={i<  )  be  a 
3D  node  made  by  a  connected  edge  u  m  the  right  image  and  a  connected  edge  >  m  the  left  image.  Let 
(  (ul  be  the  cost  of  the  optimal  set  of  paths  which  reach  to  the  3D  node  u  The  cost  (  lui  is 
computed  as  follows; 

iM  Ul 

f  (0>  =  linn  ^ 

I  ■  uul 

\  /XHu.nHu  -  H /);/);/)  1-  ( (u  -  /(/);/)) 

<'l  (5.;) 

((O)  =  0.  i.e..  C(0:/)  =  0  for  all  / 

where  u  =  (u.v).  i(i)-=(t(i),j{t)).  Q<i{i)<u.  0<j(i)<v.  r(/)+;(/)^0. 

Here.  C{u\/)  is  the  cost  of  the  path  on  scanline  r  in  the  optimal  set;  that  is,  C(u)= 

Dim.kj)  is  the  cost  of  the  optimal  3D  primitive  path  from  node  k  to  node  m  on  the  2D  plane  for 
scanline  i.  A  3D  primitive  path  is  a  partial  path  between  two  3D  nodes  on  a  2D  search  plane  and  it  has 
no  vertices  at  the  nodes  belonging  to  a  30  node.  So  a  3D  primitive  path  is  a  c.bam  of  2D  primitive 
paths  and  an  intra-scaniine  search  is  necessary  to  obtain  the  optimal  3D  primitive  path  on  a  2D  plane 
between  two  given  30  nodes.  The  function  Ku  .t)  gives  the  index  of  a  20  node  belonging  to  the  3D 
node  u  on  the  2D  plane  for  scanline  /.  The  numbers  s{u)  and  e{u)  specify  respectively  the  starting 
and  ending  scanlines  between  which  the  3D  node  u  exists.  The  cost  Ciu)  is  minimized  on  the 
function  /■(/).  A  3D  node  u-l(i)  gives  the  start  node  of  the  3D  primitive  path  on  sc.mline  /.  The 
inter-scanline  constraint  is  represented  by  /(/).  For  example,  if  id)  is  independent  of  iii—l),  there 
are  no  constraints  between  scanlines  and  the  search  represented  by  equation  ((5.2))  becomes 
equivalent  to  a  set  of  intra-scaniine  searches  which  are  performed  independently  cn  each  scanline. 
Intuitively,  /(r)  must  be  equal  to  /(f-  1)  in  order  to  keep  the  consistency  constraint. 

The  iteration  starts  at  u  =(0.0)  and  computes  C(o)  for  each  3D  node  u  m  ascending  order  of  u.  At 
each  30  node  the  /(/)’s  which  give  the  minimum  are  recorded.  The  sequence  of  2D  primitive  paths 
which  forms  the  3D  primitive  path  is  also  recorded  on  each  scanline.  The  set  of  paths  which  gives 
C(U)  at  the  3D  node  U  =  (i.y)  (which  13  the  3D  .'.o-.-v  .ormed  by  the  ngnt  ends  of  stereo  images)  is 
obtained  as  the  optimal  set. 

It  should  be  noted  that  when  there  arc  no  connected  edges  except  for  the  right  and  left  sides  of  the 


iinaycs.  the  algorithm  ((5.2))  works  as  a  sot  of  mtra  scanlme  searches  rcpoaicd  on  each  scanlme 
indhDcnclently  In  this  .sense,  the  2D  algorithm  com()lelely  contains  the  2D  one. 

5.2.4. 1  Consistency  constraints  in  inter-scanline 
Using  the  term  3D  node  defined  m  tlie  previous  section,  we  can  decenbe  the  inter  -.ranlme 
consistency  constraints  as  follows'  ^■or  any  3D  .node,  eitt'sf  nn  CQfiescof'U>!'!j  2D  nodes  arc  tt^e 
vvriictis  cm  the  set  ot  itattis  /n  rite  3D  scitren  suacc  or  noite  ol  tnecn  etc  ti'C  .'crtices  un  cet  o( 
;.'afns  VVe  need  to  I eprosont  this  constraint  as  ttie  leiation  between  /i/iand  /i i )  m  eguation  (5  2). 
To  do  this,  lot  us  consider  tfte  e.’.ampie  in  liguro  5  4  Suppose  we  are  ti  /mg  to  obtain  a  set  of  3D 
primitive  paths  which  reach  to  node  u.  In  order  to  satisfy  the  consistency  constraints  above,  all  the 
starting  points  of  these  paths  should  be  the  same  30  node:  that  is  i(i  )  =  i{i—  I).  The  cases  when  the 
starting  point  is  a  different  3D  node  are  shown  as  case  2  and  case  3  m  the  figure,  in  case  2,  a  new  3D 
node  appears  at  scanline  i  and  the  starting  point  changes  to  the  new  one.  Cf  course,  it  is  possible 
that  the  starting  point  does  not  change  to  the  new  3D  node.  This  will  happen  if  the  cost  of  the  paths 
having  vertices  on  the  30  node  is  higher  than  the  cost  of  the  paths  not  having  vertices  on  it.  In  case  3. 
the  3D  node  u— /(/— *1)  disappears  on  scanline  t  and  the  starting  point  is  forced  to  move  elsewhere. 

30  nod*  30  nod*  u 

-  1 

t 

case  1 


Figure  5-4:  Three  cases  for  consistency  constraint. 
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Let  us  denote  the  3D  noiio  u— /(/).  from  which  the  30  tiiimitive  path  ctarts  and  reaches  to  the  3D 
node  u  on  scnnlme  i.  by  ;/"M  u  i).  Then  the  foilowmcj  rules  stioiiid  be  satisfied  m  each  case. 


( <n<T : 

'nn{  u:i—  1 ) 

(  i/m’2: 

fn>i{  ‘niH  U' 

,/);/)  =  //»»;(  u:i~  \  ) 

(5.3 

Ji»H  u.l)  -j 

••iii{_iiiin  u.l—  1 ):/—  1 ) 

The  rules  in  case  2  and  case  3  lequire  that  the  decision  at  3D  node  u  depend  on  decisions  at 
pi  needing  30  nodes.  Unfortunately,  a  Uecisicn  system  with  such  a  property  is  not  M.irkovian  as 
described  m  section  5.2.2.  and  tiiuretore  ttiere  is  no  guaiantee  of  obtiuning  an  optimal  solution  !>•/ 
using  dynamic  programming.  Tins  means  if  we  search  for  a  solution  using  dynamic  programming 
with  those  rules,  the  result  might  be  ooorer  than  that  of  the  2D  algorithm. 

In  order  to  assure  optimality  m  dynamic  programming,  we  modify  the  rules  in  ((5.3))  as  follows. 
easel:  1) 

easel:  /roi(uu)^fnn(uu~- [)  (5.4) 

easel:  frm{u:t)<fiviu\i~\) 


The  new  rule  for  case  2  requires  that  the  new  3D  node  on  scanline  i  be  on  the  right  of  the  3C  nece 
that  is  the  starting  point  on  scanline  (—  1.  For  case  3.  the  new  starting  node  on  scantme  snc.ic  s# 
on  the  left  of  that  on  scanline  /—I.  It  should  be  noted  that  though  the  new  rules  are  aiwa,s  sat  s’*: 
wnen  the  rules  in  equation  ((5.3))  are  sati.sfied.  the  converse  is  not  true.  Thus,  under  ;-p 
the  consistency  constraint  might  not  be  satisfied  at  all  places  In  ether  .vercs  •  » 
represented  by  the  rules  in  equation  ((5.4))  are  weaker  than  those  of  equanen  5  ; 


can  expect  to  ebluin  an  optimal  sdutien  -.n  dynamic  prpgra.mr 


the  3D  search  algorithm  than  by  the*  20  se?rr  i  piijonthm. 


5.2.5  Experiments 

Implementation  of  the  stereo  algcnthm  has  bee  p''95e''te'  ■ 

edges  and  linking  them,  and  a  dc-hr.  •  p"  ;■  -  sr  t,  - 

details  of  the  method  of  detecting  eey---.  ^  i .  •  .  -  -  - 

The  computation  of  cost  m  our  : ; 
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by  edges  in  the  right  and  left  images  on  the  same  scanline.  If  we  let  .  </.  and  h  ...  b,  be  the 
intensity  values  of  the  pixels  which  comprise  the  two  intervals,  then  the  mean  and  variance  of  all 
pixels  in  the  two  intervals  are  computed  as: 
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In  the  definition  above,  both  intervals  give  the  same  contribution  to  the  moan  in  and  variance  a' 
even  when  their  lengths  are  different.  The  cost  of  the  primitive  path  which  matche.3  these  intervals  is 
.defined  as  follows: 

C  =a=VFTF  (5.6) 

P 

We  have  applied  our  stereo  algorithm  to  images  from  various  domains  including  synthesized 
images,  urban  aerial  images,  and  block  scenes.  Only  the  results  of  urban  aerial  images  are  presented 
here. 

The  stereo  pairs  u%ed  here  are  aerial  photographs  of  the  Washington,  D.C.  area.  The  first  stereo 
pair  is  "white  house"  and  the  second  one  is  "pentagon".  They  have  been  rectified  using  the  camera 
models  which  was  computed  by  Gennery's  program  [Gennery  79]  using  manually  selected  point 
pairs. 


Figures  S-S,  S-6.  and  5-7  show  the  original  stereo  pair,  edges  and  connected  edges,  for  the  "white 
house"  scene,  respectively.  The  image  size  is  388x388  pixels  and  the  intensity  resolution  is  8  bits. 
This  example  is  an  interesting  and  difficult  one  because  it  includes  both  buildings  and  highly  textured 
trees.  Many  connected  edges  are  obtained  around  the  building  while  few  are  obtained  in  the  textural 
part.  The  di.sparity  maps  obtained  by  the  20  and  30  search  algorithms  are  shown  in  figure  5-8.  Since 
the  maps  are  registered  in  the  right  image  coordinates,  the  disparity  values  for  pixels  on  the  right  wall 
of  the  central  building,  which  is  visible  in  the  right  image  but  occluded  in  the  left,  are  undetermined. 
Considerable  improvements  can  be  observed  at  the  boundaries  of  buildings.  In  the  textural  part,  the 
two  algorithms  provide  approximately  the  same  results. 


We  counted  the  number  of  positions  where  the  ccrsisrs.-cv  constraint,  described  in  section  5.2.4. 1 
is  not  satisfied.  It  is  436  in  the  20  search  and  32  in  the  30  search.  These  numbers  quantitatively  show 
a  significant  improvement  achieved  by  the  30  search  algorithm.  The  reason  the  inconsistency  is  not 
completely  removed  in  tfie  30  case  s  tliai  wo  used  "weaker"  rules  for  the  constraint  as  described 
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Figure  5-5;  The  "white  houce"  stereo  pair  of  urban  aerial  images. 
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Figure  5*6:  Ed:;es  extracted  from  th*  images  in  figure  5-5. 


Fifur*  9-10;  CXspvity  mop  obtained  forth*  "pontagon"  storoo  pair  (figure  S-9). 

Thit  ii  regiaieretf  in  ttta  imtt  I'mogt  coordineres. 

Afor/c*  that  tha  dataifad  atweturaa  of  tha  building  root 
and  tha  brldga  ovar  tha  highway  (uppar  latt  corner;  hava  baan  recovered. 
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5.3  GENERATING  DETAILED  SCENE  DESCRIPTIONS  FROM 
RANGE  IMAGES 

5.3.1  Introduction 

An  important  problem  for  robotics  vision  is  that  of  generating  a  30  description  of  an  unknown  scene 
from  range  data.  Tlie  range  data  themselves,  a  set  of  30  surface  points,  are  often  not  useful  for  tasks 
such  as  model-based  recognition  and  localization,  model-based  inspection  and  verification,  and 
change  detection. 

The  result  of  our  research  is  a  method  to  extract  a  compact,  symbolic,  three-dimensional 
description  of  polyhedral  objects  in  a  scene.  Importantly,  the  descriptions  are  quite  complete,  that  is, 
most  of  the  visible  faces,  edges,  and  vertices  are  represented.  Most  previous  attempts  at  range  data 
analysis  did  not  result  in  such  complete  descriptions  [Agin  72,  Duda,  Nitzan,  and  Barrett  79,  Oshima 
and  Shirai  79,  Smith  and  Kanade  84,  Tomita  and  Kanade  84].  (An  exception  is  the  work  of  Sugihara 
[Sugihara  79].) 

9.3.2  Approach  • 

The  overall  goal  of  this  research  is  to  obtain  a  full  symbolic  description  of  a  scene  from  range  data 
obtained  from  multiple  views.  In  our  approach,  each  view  is  processed  in  sequence,  and  the  30 
information  obtained  from  each  view  is  used  to  incrementally  construct  a  model  of  the  scene 
environment. 

The  main  steps  followed  by  the  overall  system  are  the  following.  A  description  of  the  scene,  in  terms 
of  faces,  edges,  and  vertices,  is  obtained  from  each  view.  Descriptions  from  separate  views  are  then 
matched  to  obtain  corresponding  elements  and  to  obtain  the  global  coordinate  transformation.  This 
permits  the  separate  descriptions  to  be  merged,  resulting  in  a  more  complete  overall  description  of 
the  scene.  The  matching  and  merging  algonthms  are  described  elsewhere  [Herman  85].  Here,  we  will 
explain  how  the  initial  descriptions  are  obtained. 

Two  general  approachas  for  segmenting  range  images  are  edge/line  extraction  [Smith  and  Kanade 
84,  Tomita  and  Kanade  84,  Sugihara  79]  and  region  extraction  [Faugeras  and  Hebert  83.  Duda, 
Nitzan,  and  Barrett  79,  Oshima  and  Shirai  79].  Our  method  is  based  primarily  on  edge  and  line 
extraction  because  we  are  attempting  to  obtain  complete,  detailed  descriptions  of  the  faces,  edges, 
and  vertices  in  the  rcene.  Furthermore,  our  matching  algorithm  assumes  such  complete  descriptions. 
Such  descriptions  are  more  difficult  to  obtain  when  region  segmentation  methods  are  primarily  used. 
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Our  method  involves  the  following  steps:  (1)  acquire  the  range  images  using  a  light-stripe 
rangefinder.  (2)  ftnct.edgfc  points  in  the  image.  (3)  fit  linear  segments  to  the  edge  points  using  the 
Hough  transform.  (4)  connect  the  segments  by  extendi»>g.  shortening,  or  shifting  them,  (5)  convert  tlie 
lines  and  junctions  in  the  image  to  30  edges  and  vertices.  (6)  generate  faces  from  the  edges  in  the 

scene. 

It  is  interesting  to  note  that  although  we  arc  working  with  30  data,  most  of  the  steps  in  the  algorithm 
are  performed  in  the  20  image  space.  This  is  because  algorithms  for  20  are  often  simpler  and  more 
efficient,  in  both  space  and  tune,  than  those  for  30.  One  example  is  finding  lines  with  the  Hough 
transform.  The  30  version  of  the  algorithm  is  much  more  expensive  and  complicatCHj  than  the  20 
version. 

5.3.3  Range  Data  Acquisition 

The  range  data  we  use  are  obtained  with  the  White  Scanner  light- stripe  rangefinder.  The  illuminator 
is  a  laser  which  projects  a  vertical  plane  of  tight  into  the  scene.  The  intersection  of  the  plane  of  light 
with  an  object  surface  results  in  a  light  stripe,  which  is  imaged  by  a  camera  lying  to  the  left  of  the 
illuminator.  The  further  a  surface  point  on  the  stripe  is  from  the  illuminator,  the  further  to  the  left  it  will 
be  seen  in  the  camera  image.  The  rangefinder  determines  the  position  of  the  stripe  at  each  camera 
scan  line,  and  triangulation  is  used  to  obtain  the  30  coordinates  at  these  positions.  The  result  is 
represented  as  a  column  vector.  When  the  illuminator  is  swept  across  the  field  of  view,  we  obtain  a 
sequence  of  such  column  vectors,  one  for  each  stripe.  The  sequence  of  columns  forms  a  range 
image  (actually  a  set  of  images,  one  each  for  a  binary  mask  and  for  x,  y,  and  z  values). 

Fig.  5-1  ia  shows  the  mask  image  for  a  polyhedral  object.  Each  column  in  the  image  corresponds  to 
a  column  of  light.  The  rows  in  the  image  correspond  to  camera  scan  lines.  This  is  called  a  "mixed 
registration”  [Smith  and  Kanade  84).  The  geometry  in  this  image  cannot  be  treated  as  in  a  camera 
image,  since  it  is  formed  differently.  However,  the  outline  of  the  objects  in  this  image  are  very  nearly 
ihe  same  as  would  be  seen  if  the  eye  were  placed  at  the  illuminator.  The  object  as  seen  from  the 
camera  is  reconstructed  in  Fig.  5-i  ib. 

5.3.4  Threo'dimensional  Edge  Detection 

This  section  describes  how  points  m  the  range  image  tf...r  ..'ise  from  real  scene  edges  are  found. 
We  consider  three  kinds  of  edge  points  ccc  ..rtmg,  convex,  and  concave.  Occluding  edge  points  are 
located  where  there  is  a  discontinuity  n  n  e  the  difference  in  z  values  between  .adjacent  pixels 
exceeds  a  threshold)  or  where  there  is  c  c .  ^  octwccn  data  and  no  data  regions. 
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Convex  and  concave  eilge  points  are  found  by  examining,  in  turn,  each  column  in  the  range  image, 
and  calculating  the  30  cifTvature  at  each  point.  If  the  curvature  is  a  local  maximum  or  minimum  and 
exceeds  a  threshold,  the  point  is  a  concave  or  convex  eilge  point,  depending  on  vi/hether  the 
curvature  is  positive  or  negative.  The  results  of  this  process  are  shown  in  Fig.  5-12  for  two  range 
images,  where  convex  points  are  signified  by  ”  ".  concave  points  by  and  occlurling  points  by 
. .  These  are  the  mixerl  rygistration  images. 

Notice  that  many  occiurlincj  points  have  concave  or  convex  points  ver/  near  them.  Wo  believe  that 
this  is  inherently  due  to  the  thicKness  of  the  light  stripes  [Yoshida  84)  Fig.  5-  13a  shows  u  vertical  light 
stripe  lying  on  a  lace  with  diagonal  bounclanes.  as  seen  from  the  camera.  As  described  above,  at 
each  scan  line,  the  rangefinder  chooses  a  point  (which  is  probably  near  the  center  of  the  stripe 
thickness)  to  represent  the  position  of  the  stripe.  Since  the  stripe's  appearance  is  beveled  near  the 
face  boundaries,  the  center  of  the  stripe  is  shifted.  Since  points  on  a  stripe  that  are  further  to  the  left 
in  the  camera  image  are  assumed  to  arise  from  scene  points  further  from  the  illuminator,  and  vice 
«'e;'sa,  the  measured  tight  stripe  in  Fig.  5-l3a  results  in  a  slight  concavity  near  the  top  of  the  stripe, 
end  a  slight  convexity  near  the  bottom.  In  Fig.  s-i^.  the  results  are  just  the  opposite,  with  a 
convexity  near  the  to{>  of  the  stripe  and  a  concavity  near  the  bottom.  In  Fig.  5-12,  this  phenomenon 
occurs  primarily  on  faces  that  are  highly  oblique  with  respect  to  the  illuminator,  since  the  stripes 
appear  thicker  when  viewed  from  the  camera. 

5.3.5  Fitting  Linear  Segments 

Once  the  edge  points  have  been  found,  we  want  to  fit  linear  segments  to  them.  The  Hough 
transform  [Ouda  and  Hart  72]  is  used  here.  However,  the  straight-forward  method  of  choosing  all  cells 
in  the  Hough  accumulator  whose  values  exceed  a  threshold  was  not  successful  because  clusters 
tend  to  cover  several  cells  and  they  overlap,  resulting  in  several  extracted  lines  for  each  cluster.  To 
get  around  this  problem,  as  soon  as  a  line  is  extracted,  the  effects  on  the  accumulator  of  all  the  edge 
points  corresponding  to  the  line  are  eliminated.  The  algorithm  we  use  is  the  following. 

1.  Transform  each  point  in  the  edge  image  to  a  sinusoidal  curve  in  the  accumulator 
space. 

2.  Choose  the  accumulator  cell  (r.  (7)  with  the  largest  value.  If  the  value  is  less  than  a 
threshold,  exit. 

3.  Find  linear  clusters  of  points  in  the  odge  image  that  represent  lino  segments  along  the 
line  (r.  0).  This  is  dorie  by  searching  for  points  in  the  edge  image  v/ithm  some  thickness  t 
of  the  line  (r,  0).  and  determining  .vnich  of  these  points  cluster  together.  Each  resulting 
line  segment  is  defined  by  i*s  20  enc  pciots.  its  30  end  points,  and  itr.  3U  line  parameters. 
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Figure  5*1 2:  Edge  points  i-egisf^'^ci  with  range  images.  Convex  points  are  "  +  ", 

concave  points  are  '  '  jrn  occluding  points  are  (.i)  Same  image  as  Fig. 
5-113  hi  Srjgo  points  of  another  range  image. 
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Figure  5*13:  Camera  viewpoint  of  vertical  stripes  lying  on  faces  with  diagonal 

boundaries.  The  measured  stripe  petition  results  in  a  concavity  or  convexity 

near  Its  top  and  bottom. 


4.  dliminate  the  effgjt  on  the  accumulator  of  the  points  lying  on  the  line  (r.  0).  An  efficient 
way  to  accomplish  this  is  to  decrement  each  accumulator  cell  lying  on  the  sinusoidal 
curve  corresponding  to  each  point. 

5.  Go  back  to  step  2.  * 

Each  class  of  edge  points  (i.e..  convex,  concave,  and  occluding)  is  treated  separately  and 
independently.  In  this  way.  the  resulting  line  segments  can  be  given  the  same  class  labels.  Also,  the 
direction  of  the  occluding  arrow  for  each  occluding  line  is  determined  as  a  unit  vector  in  the  image 
plane.  (The  occluding  surface  is  on  the  right  side  of  the  arrow.)  This  is  clone  by  comparing  the 
average  z  value.s  of  points  on  either  skIo  of  the  line  segment  that  are  veiy  near  ihe  segment.  The 
results  of  the  line  fitting  are  shown  in  Fig.  5-i4.  where  occluding  lines  are  represented  by  .solid  lines, 
concave  lines  by  dashed  lines,  and  convex  lines  by  dot-dash  lines. 

As  explained  earlier,  some  convex  or  concave  edge  points  may  lie  near  occluding  edge  points.  This 
may  result  in  convex  or  concave  line  segments  near  occluding  line  segments.  These  segments  should 
be  eliminated  since  they  do  not  correspond  to  real  scene  features;  they  are  an  artifact  of  the  range 
finding  process.  Fig.  5*  14  actually  shows  the  result  after  such  segments  have  been  deleted. 

5.3.8  Connect  Lines  and  Form  Junctions 

Although  the  basic  line  segments  forming  the  edges  of  the  object  have  now  beea  extracted,  as 
shown  in  Fig.  5-14,  there  are  still  many  gaps  and  inaccuracies  near  the  junctions  of  the  object.  Our 
next  step  is  therefore  to  fill  in  these  gaps  and  form  junctions  where  necessary.  This  is  done  in  three 
steps.  First,  segments  that  are  close  and  almost  collinear  are  connected.  Second,  (extended) 
segments  that  intersect  and  are  close  are  connected.  Third,  if  a  segment  has  a  dangling  (i.e., 
unconnected)  end  point,  an  attempt  is  made  to  connect  it  with  other  segments  as  in  the  second  step, 
but  using  a  larger  threshold  than  in  this  step. 

In  the  first  step,  if  two  segments  are  aimcst  collinear  and  have  close  end  points  (e.g..  segments  Li 
and  L2  in  Rg.  5-14),  a  junction  is  formed  at  the  point  midway  betv/een  these  end  points  to  connect  the 
two  segments. 

In  the  second  step,  intersecting  pairs  of  (extended)  line  segments  are  connected  if  the  intersection 
point  lies  within  a  given  threshold  distance  c’  tne  end  points  of  the  segments.  We  consider  five  cases 
heie.  In  case  1,  the  intersection  point  les  sutsice  tne  two  segments  (Fig.  5-i5a).  Both  are  extended 
and  a  junction  is  formed.  In  case  2.  ;no  in;e':ec:  an  ooint  lies  inside  both  segments  (Fig.  5-i5b).  Both 
are  shortenud  and  a  junction  is  formec  n  zj^c  2  me  intersection  point  lies  outside  one  segment  but 
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Figur#  5-14:  Result  of  fitting  line  segments  to  the  edge  image  in  Fg. 

5- 1 2b.Solid  lines  are  occluding, 
dashed  lines  are  concave,  and  dot-dash  lines  are  convex. 

inside  the  other  (Fig.  S-lSc).  The  former  is  extended,  the  latter  is  shortened,  and  a  junction  is  formed. 
In  case  4,  the  intersection  point  lies  inside  both  segments  (Fig.  S-lSd),  but  is  beyond  the  threshold 
distance  from  each  end  point  of  one  of  the  segments.  The  other  segment  is  therefore  shortened,  but  a 
junction  IS  not  formed  connecting  the  two  sogments.  Case  5  is  the  same  as  case  4,  except  the 
intersection  point  lies  outside  one  segment  but  mside  the  other  (Fig.  5- 1 5e). 
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Figure's*  1 S: .  Connecting  intersecting  pairs  of  (extended)  line  segments. 


The  thresholds  used  in  the  First  two  steps  just  described  are  conservative  and  are  only  meant  to 
connect  segments  that  are  quite  close  to  each  other.  If  liberal  thresholds  were  usedr  connections 
would  be  established  between  segments  that  should  not  be  connected.  The  result  after  these  first  two 
steps  is  shown  in  Fig.  S-16.  Note  that  the  two  end  points  P1  and  P2  in  the  Figure  seem  to  be 
"dangling.”  In  the  third  step,  therefore,  a  top-down  type  of  process  is  initiated.  We  assume  that  a 
dangling  ond  of  a  segment  should  probably  be  extended  or  shortened  by  a  larger  amount  than  the 
previously  specified  thresholds.  Intersections  between  such  an  (extended)  segment  and  other 
segments  are  obtained,  and  the  same  tests  and  procedures  as  described  in  the  second  step  are 
performed,  except  that  larger  thresholds  are  used. 


As  a  result  of  this  process,  ail  gaps  are  eliminated.  However,  lines  that  should  form  a  single  junction 
often  do  not  intersect  at  a  single  point,  resulting  in  separate  junctions.  To  merge  such  junctions,  a 
rectangular  window  is  placed  at  each  junction  point  in  the  image,  and  all  junctions  within  the  window 
are  replaced  by  a  new  junction  defined  by  the  averaqo  cosition  of  all  the  junctions.  The  result  of  this 
step  is  shown  in  Fig.  5-17.  At  this  point,  partially  occluded  segments  are  labeled  as  such.  These  are 
found  by  checking  how  many  segments  form  each  /unction.  If  a  junction  is  formed  by  only  one 
segment,  the  segment  is  marked  as  incomplete,  in  Fig.  5-17,  segments  L1  through  L5  are  incomplete. 
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Figur*  S*  1 6:  P«sult  of  connecting  segments  in  Fig.  5-14  with 

conservstive  threshold. 


5.3.7  Convert  to  30 

Thus  fsr,  all  of  the  processing  has  been  in  20,  in  the  mixed  registration  image.  In  the  next  step,  ail 
the  junctions  and  segments  in  the  image  are  converted  into  30  vertices  and  edges.  Afterward,  the  30 
faces  in  the  scene  will  be  obtained. 

The  obvioua  method  for  getting  tho  30  pcaition  of  some  point  in  the  l.mugc  ia  to  mcrv:ly  extract  its 
x.y.z  coordinates  from  the  x.y.z  range  images.  The  problem  with  this  method  is  that  it  can  result  in  a 
large  depth  error  even  if  there  is  a  small  error  in  the  20  position  of  the  point.  To  see  why,  consider  Fig. 
5-18.  Suppose  that  the  position  of  a  junction  determined  by  the  methods  described  above  is  at  point 
a.  but  the  true  position  of  the  junction  is  at  point  b.  If  we  obtain  the  30  coordinates  of  point  a  by 
extracting  them  from  the  x.y.z  range  images,  we  would  really  be  extracting  the  coordinates  of  a  point 
that  lies  significantly  inside  face  A.  resulting  in  a  large  error  in  the  z  cooroinate.  The  more  oblique 
face  A  is,  the  greater  the  error. 

To  O'/ercomc  this  problem,  we  use  the  i^nown  3- space  positions  of  the  lines  that  were  initially 
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Figure  5-17;  Result  of  extending  dangling  ends  and  merging 

junctions  in  Fig.  5-16. 

extracted  from  the  edge  image  (Fig.  5- 14),  To  obtain  the  3D  position  of  a  point  in  the  image  known  to 
lie  on  a  given  line  in  3  space,  we  calciiiato  the  intersection  of  the  line  with  the  plane  of  illumination 
corresponding  to  the  column  C  in  which  the  point  lies  (see  Fig.  5-19). 


t 


Each  junction  in  Fig.  5-17  has  a  pointer  to  a  list  of  all  tho  initially  extracted  segments  (Fig.  5-14)  that 
ultimately  led  to  the  junction.  Fig.  5-20  provides  an  example  of  how  the  30  coordinates  ul  a  junction  J 


are  determined.  Suppose  J  was  initially  ui. tamed  by  averaging  the  3  intersection  points  of  the 
segments  Li.  L2,  and  L3.  To  get  its  2D  posii.cn.  J  is  first  protected,  in  2D.  onto  each  of  its  segments. 


Figure  5- 10:  Locating  (lie  .'SO  position  of  a  vertex  directly  from  range  data 

coinetimes  results  m  a  large  depth  error 


Figure  S«  1 9:  Calculating  the  intorsection  of  a  line  in  3-space  with  the  plane  of  illumination 
resulting  in  the  prujecloO  points  pi.  p2.  and  s3  Assuming  that  each  of  these  points  lies  on  its 
respective  line  in  O-space.  their  30  positions  are  outumed  as  described  shove.  The  .'!0  position  of  the 
vertex  corrcr^tonding  to  J  is  then  obtamoj  Cf  averacjnig  the  30  positions  of  php2.  and  p3. 


After  each  junction  hns  been  convert',  d  r-c  i  X  /(.nox  a  is  simple  to  oLtum  the  30  line  parameters 
of  the  edijes  connectirvj  these  vertices 
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Fi«ur«  S*20:  Determining  me  30  coordinetee  of  lunetion  J 


S.3.0  Ceneraio  30  Facee 

The  next  step  ia  to  extract  the  3D  faces  m  the  scene.  Since  a  face  may  be  defined  by  (he  edges  mat 
bound  it.  faces  are  found  by  following  (or  traversing)  their  chains  of  edges.  The  edge  traverstil  is 
arbitrarily  chosen  to  be  docKwise  and  is  two-dimensional,  mat  is.  it  occurs  m  m«  plane  of  mo  image. 

Because  an  occluding  edge  belongs  to  one  visible  face,  such  edges  must  be  traversed  exactly 
once,  and  only  in  the  direction  of  the  occluding  arrow  Because  concave  and  convex  edges  belong 
to  two  visible  laces,  such  edges  must  nc  traversed  twice,  once  m  eoch  direction.  Fig  5-^^  snows  how 
the  fcccs  of  I'm  obiect  may  bo  recovered  by  traversing  me  edges.  Wo  rlistmguisli  two  shtsses  of  faces, 
mose  with  all  their  edges  visible  (e  g.,  face  a  m  Rg.  5-21)  and  mose  witii  partially  or  tdi.siiy  occluded 
edges  (o.g..  (ace  B  in  Fig.  5-2t).  Faces  m  the  first  class  are  found  by  a  complete  traversal  of  their 
edges.  Tlie  traversal  cun  therefore  begm  />ith  any  sdge  on  me  face  Smee  faces  m  me  second  class 
are  found  by  a  partial  traversiil  of  thcr  ceges  wc  must  make  certain  that  the  oi.ige  with  which  a 
traversal  Ijogtns  wiB  pormit  all  visible  ec.''  .c tc  be  mctudori. 

A  face  iruversai  occurs  us  follows  f  c  iHcd  the  ''seed"  .ge.  is  h  itong  with  a 
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Fifeur«8*21:  R«cov«nng  th«  facM  by  traversing  th«ir  tdgM. 
traversal  direction  along  that  edge.  The  method  of  choosing  the  edge  and  direction  wtM  be  described 
siiortly.  To  obtain  die  next  edge  m  the  traversal,  a  simple  test  is  used  to  determine  the  next  most 
clochwise  edge  in  the  traversal  direction.  In  Fig.  S  21.  for  example,  if  the  edge  FG  is  lie  seed  edge 
and  the  tra'rersal  direction  is  from  F  to  G.  then  the  next  most  clockwise  edge  is  GH.  and  its  trsversal 
direction  is  from  G  to  H.  Successive  edges  are  chosen  in  this  manner  until  one  of  the  following 
conditions  is  met  (i)  an  edge  which  has  previously  been  traversed  in  the  same  direction  is  reached 
(e.g..  a  seed  edge  will  be  reached  again  for  a  totally  viaibie  face),  (2)  an  incomplete  (i.e..  partially 
occluded)  edge  is  reached  (e  g.,  edge  Gl  in  Fig.  5-2i).  (3)  an  occluding  edge  whose  occluding  arrow 
IS  opposite  to  ihe  traversal  direcbon  is  reached.  When  one  of  these  conditions  «  met.  the  traversal  is 
terminated  and  atl  the  traversed  edges  are  assumed  to  belong  to  a  single  face. 

Seed  edges  are  chosen  so  diet  all  faces  with  some  occluded  edges  are  processed  before  faces  with 
no  occluded  edges.  The  algorithm  proceeds  as  follows.  First,  incomplete  occluding  edges  whose 
occluding  arrows  point  away  from  the  mcompieio  portion  of  the  edge  (e  g..  edge  AB  in  Fig.  5- 22a)  are 
fourxj.  Each  of  these  edges  is  used  as  a  seed,  and  the  traversal  direction  is  that  of  the  occluding 
arrow.  In  Fig.  5-22a,  the  clockwise  traversal  results  in  the  edges  AB.  QC.  and  CO.  which  are  used  to 
form  face  F.  Next,  each  incompieie  conve<  aod  concave  edge  <e.g  .  edrjes  AO  and  EF  m  Fig.  5  22b)  is 
used  as  a  seed,  and  the  traversal  direction  c  ‘'on  :><«:  incomplete  to  the  compicto  portion  of  the  edge. 


Figure  5*22:  Finding  wed  edges  to  uw  in  traversing  feces. 

Then,  for  each  occluding  edge  m  the  scene  (e  g.,  edge  aB  in  Rg.  5- 22c).  find  the  next  most  ciocKwiw 
edge  as  if  traversing  the  occluding  edge  m  the  direction  ooposite  to  its  occluding  arrow,  if  this  next 
edgo  IS  concave  or  convex  (e  g.,  edge  BC  m  Fig.  5-22c).  it  is  used  as  a  seed,  and  the  traversal 
direction  is  the  same  direction  used  to  find  the  edge. 


At  this  point,  the  algorithm  processes  faces  with  no  occluded  edges.  First,  each  complete  occluding 
edge  that  has  not  yet  been  traversed  (e  g  .  AB  m  Fig.  S-22d)  is  used  as  a  seed,  with  the  traversal 
direction  the  same  as  that  of  the  oedudmg  arrow  Rnally.  each  complete  concave  or  convex  edge 
that  has  not  been  traversed  m  both  directions  ie  g  CA  in  Fiq.  o  22e)  is  used  as  a  seed,  traversing  in 
the  direction(s)  not  yet  traversed. 


Each  set  of  traversed  edges  is  used  tc  'cr'^  i  smgie  face,  and  the  3D  positions  of  the  vertices 
connecting  these  edges  are  used  to  cctu  "'c  ciane  equation  of  the  face.  Although  our  current 
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techniques  will  find  inner  edge  chains  (e  g.,  those  that  bound  a  hole  m  a  face)  as  well  as  outer  ones, 
the  two  sots  of  chainTwiiTnot  be  associated  together  as  belonging  to  a  single  face.  Fig.  5  23  siiows  a 
perspective  view  of  the  final  30  description  generated  from  the  Ime  drawing  of  Fig.  S- 17. 


Figure  5*23:  Perspective  view  of  final  30  description  generated  from  Fig. 

5-17. 


5.3.9  Mulliple  Views 

The  processing  result  for  the  edge  image  in  Fig.  5-1 2a  is  shown  m  Fig.  5-247  The  final  30 
reconstruction  is  shown  in  Fig.  5-25.  The  two  sets  of  range  images  discussed  m  this  paper  are  two 
views  of  the  same  object.  The  next  step  in  the  processing  will  involve  matching  the  two  30  models  and 
merging  them  so  as  to  generate  a  more  complete  model.  The  matching  algorithm  matches  vertices  in 
the  two  descriptions,  and  propagates  constraints  through  the  edges  and  faces.  This  is  one  reason 
why  it  has  been  important  to  recover  almost  all  vertices,  edges,  and  faces  m  the  scene. 


5.4  SUMMARY 

This  Chapter  has  presented  results  m  both  low-level  and  high-level  aspects  of  the  30  change 
detection  task.  For  low-level  processing,  a  new  method  of  determining  stereo  correspondences 
which  are  used  in  the  computation  of  depth  for  a  pair  of  aerial  images  was  described.  For  high-level 
processing,  we  have  described  our  methods  of  representation  and  construction  scene  models  from 
multiple  views.  We  have  bypassed  the  luw  level  problems  by  using  rangefinder  data  as  our  input  for 
the  high-level  processing. 
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7.  SUk'lMARY 

In  Chapters  2  5.  we  have  clescnljerl  our  progress  tov/anls  arliievin<)  i  comhination  of  pattern 
racognrhon  image  understanding,  and  ailtftciai  intelligence  tecl^nigues  for  epacu  based  image 
processing.  uo<ng  both  optical  and  digital  or(i(.  es.sing  meihoils.  We  have  achieved  results  m  the  areas 
of  optical  leatufo  e'* traction  ami  sub-pixel  t  uget  detection,  hybrid  digiial/optical  representation  and 
matching,  ami  model-based  three-dimensional  scene  interpretation.  The  remainder  of  this  chapter 
•.’mimanzcb  the  results  achieved  over  the  past  year. 


OPTI 


AL  FIIATllrlEEX 
DETZCTICN  HIGHLIGHTS 


7i:;\CTION  /  flD  Slil3-PIXcL  TARGET 


The  highlights  cf  optical  feature  extraction  v/ork  include  : 

•  A  new  optical  processor  for  detection  of  in-plane  distortion  parameters  from  optically 
generated  chord  distributions. 

•  A  new  optical/digital  moment  processor  concept 

•  A  new  hierarchical  non-ad-hoc  tree  structure  formulation. 

•  Successful  initial  tests  of  the  moment  processor  on  ship  and  pipe  part  data  bases 

•  Promising  initial  quantifications  of  the  accuracy  to  which  the  distortion  parameters  of  the 
obiect  can  be  produced  in  the  hybrid  moment  processor. 

•  Development  of  new  correlation  SOFs. 

•  Promising  initial  ATR  test  results  on  correlation  SOFs. 


The  highlights  of  our  sub-pixol  fast  time  change  detection/recognition  effort  include  ; 

•  A  mere  iinific'd  and  accurate  image  ‘joneration  software  for  produemn  detector  images 
containing  sub-pixel  moving  targets,  correlated  noisc  and  uncorrelatud  noise. 

•  Detailed  quantitative  results  of  the  performance  of  sub-pixel  shift  estimators. 

•  Detailed  quantitative  results  of  the  performance  of  various  interpolation  schemes. 

•  Introduction  of  a  new  and  better  c^jiformance  measure  for  the  characterization  of 
background  suppression. 

•  Analytical  and  experimental  invert  itiorr  of  the  use  of  double  differencing  for 
backgrniinri  .suppression. 

•  Initiaf  forinul.'iiion  of  the  mortr  j-ri!  .r.c-’/tim;'  fillpriny’’  to  -rnhance  the  sub-pixel 
ktrget  am  I  -..upiirccs  th-*  b  ickgrounc 


I 
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•  Investigation  of  the  effects  of  detector  iimitutions  such  as  limited  dynamic  range  and 
detector  noisr  • 

•  Initial  efforts  of  mulli  r>}«jion  image  generation. 

7.2  ALGORITHMS  FOR  HYBRID  DIGITAL/OPTICAL 
REPRESENTATION  AND  MATCHING 

This  phase  of  the  protect  has  focussed  on  the  development  and  evaluation  of  methods  which  yield 
rrprecentationr,  of  structural  and  textural  information  m  an  image,  and  may  be  used  for  matching 
imnnes  to  scene  models  The  principal  results  achieved  m  this  research  include: 

•  i^robabiiistic  Graph  Mulching  ■  Attributed  graph  structures  are  used  as  models  of 
structural  and  statistical  information  m  the  image.  Matching  of  these  graph  structures 
using  probabilistic  similarity  methods  poses  a  number  of  interesting  problems  in  the 
mathematical  formalism,  m  the  computational  matching  algorithms,  and  m  the  application 
of  these  methods  to  real  images.  We  have  investigated  methods  of  subgraph 
decomposition  which  permit  branch -and  bound  search  of  the  matching  tree  and  provide 
efficient  pruning  of  the  possible  matches. 

•  Multipit  Rtsoiution  Rotation-invanani  Qoaratora  •  The  MRl  (Multiresolution  Rotation 
Invariant)  operator  and  the  MRO  (Multiresolution  Difference)  transform  have  been 
introduced  to  extract  structural  and  textural  features  of  images  for  use  in  matching  and 
interpretation  phases  of  analysis.  The  MR!  is  a  complex  operator  derived  from  derivative 
expansions  of  Gaussian  Kernels  and  will  have  magnitude  of  response  independent  of 
feature  orientation  and  phase  sngie  of  response  which  provides  information  about 
orientation.  The  spatial  and  frequency  domain  properties  of  these  operators  have  been 
studied  and  an  approximate  MRi  operator  which  uses  difference  of  shifted  Gaussian 
Kernefs  has  been  derived  and  shown  to  be  computationally  efficient  due  to  the  scaling 
and  shift  properties  of  the  Gaussian  Kernel.  The  MRI  operators  have  been  applied  to 
aerial  images  of  objects  and  textures. 

•  Texture  Analysis  -  The  MRI  operators  described  above  have  been  used  to  characterize 
and  classify  textures  from  aerial  images.  This  set  of  multiresolution  operators  permits 
classification  of  texture  independent  of  the  size  and  orientation  of  the  texture  pattern 
itself.  The  statistical  distribution  of  the  magnitude  responses  is  analyzed  across  the  set  of 
operators  for  regions  of  the  image.  Correlation  with  the  corresponding  magnitude  range 
and  the  corresponding  phase  distribution  provides  information  on  the  relative  scale  and 
the  relative  orientation.  Experiments  on  textures  from  aerial  images  and  textures  from 
simple  patterns  have  been  earned  out  and  compared  to  previous  texture  energy 
operators. 

The  algorithms  studied  in  this  section  '•it.jct  the  ititcrdiscipiinary  nature  of  the  project.  The  MRl 
operators  and  associated  texture  measures  ;ire  particularly  well-suited  to  parallel  or  optical  processor 
implementation.  They  will  be  implemented  evaluated  on  the  array  processor  with  RAPiDbus  host. 
Our  formulation  of  the  recursive  muaa  ’• ,  -  -7  aigonihmc  is  also  iniondcd  for  implciiiontation  on 


this  type  of  architecture  with  exteiisior>s  winch  may  mtccjratc  symbolic  and  numerical  processing 
The  interoctive  use  of  parallel  and  optical  preprocessing  with  hypothesis  formation  and  adaptive 
search  strategies  wii!  tv*  natural  conlmuatio''.  of  the  work  completed. 

7.3  IMAGE  UNDERSTANDING  TECHNIQUES  FOR  3D  SCENE 
INTERFl-iETATION 

Onr  effort  this  /ear  has  resuiletl  m  techniques  doaiing  wiili  two  levels  of  oroct’ssmq  reriiiirerl  tor  the 
i.i:.!'.  of  (li  ;crihin<j  20  scenes  t!ie  20  inuaria  level  dcteclmij  features  .‘.;.ch  a;;  edge;  imt-s  .md 
coiiieis.  m  images,  and  the  30  scene  level  ropiepenting.  coiistriictinq.  .mo  iipdahrig  iho  JO  scene 
•mcd.;l.  'Jur  principal  results  include: 

•  ^toieo  CorrcsDonctcnce  umig  Dynamic  Programming  (2D  imnne  Level)  ■  We  have 
described  a  method  to  match  the  f-tnpoiar  line  pairs  m  a  stereo  pair  and  determine  a 
rather  dense  depth  mop  of  the  scunc.  using  intra-  and  mter-scanline  search. 
iiura-scaniine  search  determines  the  correspondence  between  edges  m  the  same 
scanline  of  the  left  and  right  images  This  search  can  be  treated  os  the  problem  of  finding 
a  matching  path  on  a  20  search  plane  whose  axes  are  the  right  and  left  scanlinas. 
Verticalty  connected  edges  in  the  images  provide  consistency  constraints  across  the  20 
search  piani^.  inter-scanime  search  m  a  30  search  space,  which  is  a  stack  of  the  20 
search  planes,  finds  the  vertically  connected  edges  and  applies  the  constraints.  By 
considering  both  intra-  and  inter-scaniine  searches,  the  correspondence  problem  can  be 
cast  as  that  of  finding  in  a  three  dirriensiunai  search  space  the  rnntchmg  surface  that  has 
the  best  match  scores  from  intra  scanime  search  and  also  satisfies  the  consistency 
constraints  from  inter  scnnlme  search.  This  prublem  is  solved  using  dynamic 
programming  for  both  searches. 

•  Three-Dimensional  Model  Ouilomy  ana  Maintenance  (3D  Scene  Level)  •  We  have 

investigated  model  building  using  rangefinder  data,  which  is  already  three  dimensional, 
bypassing  the  problem  of  generating  a  30  description  from  20  data  We  have  developed 
techniques  for  representing,  constructing,  and  updating  the  scene  model  The  model  is 
in  the  form  of  CO  faces,  liifgos  verticer,  and  thcir  topology  and  geometry  A  range  image 
is  oegmenied  into  edge  points  to  v/ivch  linear  segments  are  fit  The  original  ime 
r-egmentc  arr  roflnod  to  I'lurinm*  n.-r.';  ^  -res  then  fit  to  the  lino  dr.T.vmg,  The  final 

model  is  represented  as  a  graph  m  terms  of  the  symbolic  primitives  me.  'oce.  eh’je.  a.hd 
verfc'*.  Altliough  the  final  Jeuenpt'or'  is  three-dimensional,  moot  of  the  proces  .mg  13 
done  in  the  two-dlmensionul  image  space  Future  work  will  combine  model  information 
to  obtain  a  full  symbolic  description  cf  a  scene  from  range  data  obtained  from  multiple 
viewpoints. 

In  the  future,  v/e  will  continue  our  .vor-  ;n  ocih  high  luvcl  and  low  level  image  procorsmg  that  is 
required  for  the  CD  scene  analycia  tu-:"  C..'  •*'fcrr  .vill  focus  on  analy.;ing  and  extracting  CD  repetitive 
tpAtural  features  from  images,  tt;  "  ig  zu'  ':.''“0  algorithm,  and  representing  and  matefuny  CD 
scone  models. 


•  TeMhire  Aericii  images  o*  urban  scenes  contain  j  ijiqe  amount  ot  texii.-res  made  ol 
reoeiiiiwe  oaitemiT  suen  ns  windows  on  the  buikimy  tees  The  aijiiity  to  find  and 
characterize  such  ie»turcr.  is  esientmi  to  nnatyre  complex  images  ot  man  matle 
structures  '^C  will  sludy  the  problem  Ol  detecting  ;u»d  segmenting  the  regions  rnatle  Ol 
regular  arrays  ol  repeated  putterns  in  images  by  using  ttic  analysis  ol  variation 

•  ^ic'sr  Stereo  is  one  ot  the  most  important  ways  of  extracting  30  features  from  images 
A  lost  robust  stereo  capability  v/ouifl  greatly  enhance  any  30  scene  inierpreiauun 
system,  and  would  result  m  a  significant  step  towards  an  effective  change  detection 
system  We  expect  to  continue  work  on  the  stereo  algorithm  based  on  the  dynamic 
piogrammmg  lechniqoe  described  m  this  report  to  increase  us  spee<l  and  improve  me 
ou.'lil,  of  Its  matching  results  Oi  r  next  step  will  bo  to  mroriioraio  multi  resolution 
lechnituK-a  into  this  aigomh-n  This  shou.  •  improve  maiching  quality  t>^cause  it  is  easier 
and  more  >  .noble  to  matcii  at  lower  rosoiuhon  (smaller)  images  and  the  rePulls  can  be 
prouago.tcd  to  highr^r  resolution  (larger)  images  Soeed  should  also  be  improved,  since 
results  from  smaller  images  can  be  used  tu  limit  the  range  of  search  m  larger  images. 

•  3D  Mooti  Acouisition  ana  Matching  Once  30  features  have  been  extracted  from  the 
images,  they  must  be  accumulated  into  a  coherent  model  and  matched  with  previous 
models  to  determine  whether  changes  m  the  structure  ot  the  scene  have  occurred. 
Matching  «  also  necessary  when  merging  tvro  scene  desenppons  of  the  same  scene, 
perhaps  Obtainsd  from  diftersnt  viewpoints,  into  a  smgls  conaistant  description,  or  whan 
idsntifying  ths^sams  X  oOfseta.  such  as  moving  obisets.  m  diltsfsot  scenss.  We  wiH 
continue  our  investigation  into  this  probism  of  rsconstructmg  and  matching  3D 
descnptions  from  a  dense  depth  map  which  will  be  obtained  either  from  stereo  or  from 
direct  range  finding  sensors. 
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ABSTRACT  A  multi-level  dassitier  for  multi-clasa  3-0  distortion -invariant  recognition  IS  described. 
New  real  imagery  and  distortion  parameter  estimation  accuracy  data  are  presented. 


A  feature  apace  ^ocessor  for  multi-class  distortion-invariant  pattern  recognition  is  detailed  in 
Section  2  A  moment  feature  vector  space  is  considered.  Test  data  [i  2]  on  a  robotic  database  are 
summanztd  in  Section  3.  Results  on  a  ship  database,  using  real  input  imagery  with  references  from 
models  is  prosented  with  attention  to  preprocessing,  distortion  parameter  estimahpn.  and  class 
identification  are  advanced  in  Section  4. 

2^BQC£SSflB 

A  moment  feature  space  is  easily  generated  optically  [3-4.5]  or  digitally  [6|  its  outputs  can  easily  be 
corrected  for  processing  errors  in  p>ost- processing  [3].  Moments  are  jointly  Gaussian  random 
variables  [2]  due  to  sampling  with  respect  to  m-plane  distortions.  Thus,  they  allow  use  of  a  Bayesian 
classifier  and  thus  can  minimize  To  determine  the  class  >  (object  class  c  and  aspect  view  <p)  and 
the  object's  di^ortions  (described  by  a  distortion  parameter  ^  for  each  computed  input  moment 
vector  tji,  we  calculate 

with  calculated  iteratively  (k  is  the  iteration  index)  using 

The  class  1  that  minimizes  (1)  defines  c  and  the  out  of -plane  rotation  angle  (aspect)  (p  of  the  input, 
whereas  C.  providet  estimates  of  translations,  scales,  and  m- plane  rotations.  The  number  of  iterations 
M  can  be  reducad  to  4-6,  ^  s  /  can  be  used  m  (i)  and  (2).  and  ^  m  (2)  calculated  as  an  update  [i.2|. 
This  signifir.anity  reduces  the  computational  )cad  per  class/aspect  i. 
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Tlie  m;i|or  problem  is  the  large  number  of  aspect  classes '  that  need  potentially  be  searched. 

To  relieve  this,  wo  uA  tW5  first-level  estimators  ( 1  2)  to  estimate  the  aspect  (this  is  achieved  by 
I  s  /  /I  ,)  and  class  (a  hierarchical  tree  is  used  for  this,  with  the  node  structure  chosen  from  a 
multi-class  Fislier  protection  and  with  a  two-class  Fisher  discriminant  vector  useJ  per  node).  As  we 
sliow  in  Section  3.  this  reduces  the  number  of  aspect-classes  i  to  be  searched  and  thus  makes  the 
processor  very  computationally  efficient.  A  block  diagram  of  the  system  is  shown  in  Figure  A- 1 


Figure  A‘1 :  Block  diagram  of  a  multi-level  moment  feature-space  classifier 
3.  PIPE  PART  TEST  RESULTS 

Nine  different  pipe  parts  (4  classes)  viewed  from  a  50*  depres.sion  angle  were  digitized  (128  x  128 
pixels)  with  36  images  per  part  (one  image  every  10*  in  aspect)  and  used  as  our  test  database.  Test 
results  are  summarized  in  Table  i.  They  show:  9  out  of  36  references  are  adequate  (Test  i).  Use  of 
the  first-level  estimator  reduces  the  number  of  i  to  be  searched  in  (1)  to  10  (Test  2).  The  number  of 
iterations  k  in  (2)  is  only  6  over  a  large  range  (Test  3)  and  .2  =  i  in  (1 )  and  (2)  is  adequate  (Test  4). 
As  seen  in  Table  A-i,  the  system  of  Figure  A-i  can  correctly  classify  over  97%  of  the  324  images 

(using  only  9  x  4  ■  26  references). 

4.  DISTORTION  PARAMETER  ESTIMATION  ACCURACY 

Related  tests  on  another  database  [2.7]  showed  comparable  performance  and  similar  operational 
parameters.  In  this  database,  the  reference  obiects  were  obtained  from  models  and  in  tests  against 
real-world  IR  images,  excellent  recognition  was  obtained.  The  preprocessing  required  [7]  used  only 
simple  10  and  20  histogram  operations  and  thresholding  (to  maintain  low  computational  overhead). 

We  now  consider  the  class  c.  aspect  (p.  scale  n  and  translation  x^  estimation  accuracy  of  the  system 
for  a  second  five-class  databa.se  (36  images  at  '0*  aspect  intervals  per  class)  using  only  four 
references  per  class.  The  true  object  was  the  80*  .aspect  view  of  the  class  i  image.  A  real  IR  input 
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CORRECT 
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Table  A- 1 :  Representative  Pipe  Part  Data  (Different  Test  Conditions) 

image  (vs.  references  obtained  from  models)  at  a  depression  angle  10°  different  from  that  of  the 
reference  set  was  used  with  real  IR  noise  present  in  the  input.  The  tests  (Table  A-2)  show  perfect 
class  and  aspect  classification  for  -  10~'  (for  A^.  =  0.5,  errors  resulted  as  expected)  and 

excellent  shift  (r^  in  pixels)  and  scale  factor  (a)  distortion  parameter  estimation.  All  distortion 
parameters  were  estimated  within  5%  accuracy,  due  to  the  input  resolution,  noise,  etc.  factors. 


TEST 

NUMBER 

TTOTSC'ALr 

a/PiXEL 
SHIFT  xg 

a/xn 

ESTIMATE 

CLASS/ASPECT 

ESTIMATE 

1.0/0 

1.0/0 

o 

O 

CO 

wBSm 

1.016/14.22 

1/80" 

3 

1.023/23.22 

CD 

O 

o 

4 
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O 

o 

5 

0,75/0 

0,750/0.07 

1/80“ 

6 

0.9/0 

0.90/0.03 

1/80“ 

Table  A-2:  Results  of  Class  and  Distortion  Estimation  Tests 
(True  Class  1.  Aspect  80*) 
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